• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# How Protobuf supports multiple C++ build systems
2
3This document explains how the Protobuf project supports multiple C++ build
4systems.
5
6## Background
7
8Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++
9runtime and Protobuf compiler[^historical_sot]. However, there are several
10different build systems in common use for C++, each one of which requires
11essentially a complete copy of the same build definitions.
12
13[^historical_sot]:
14  On a historical note, prior to its [release as Open Source
15  Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html),
16  the Protobuf project was developed using Google's internal build system, which
17  was the predecessor to Bazel (the vast majority of Google's contributions
18  continue to be developed this way). The Open Source Protobuf project, however,
19  historically used Autoconf to build the C++ implementation.
20  Over time, other build systems (including Bazel) have been added, thanks in
21  large part to substantial contributions from the Open Source community. Since
22  the Protobuf project deals with multiple languages (all of which ultimately
23  rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a
24  project-wide build system -- in fact, Bazel (and its predecessor, Blaze)
25  was designed in large part to support exactly this type of rich,
26  multi-language build.
27
28Currently, C++ Protobuf can be built with Bazel and CMake. Each of these build
29systems has different semantics and structure, but share in common the list of
30files needed to build the runtime and compiler.
31
32## Design
33
34### Extracting information from Bazel
35
36Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to
37traverse the build graph, inspect build rules, define additional actions, and
38expose information through
39[providers](https://bazel.build/rules/rules#providers). For example, the
40`cc_proto_library` rule uses an aspect to traverse the dependency graph of
41`proto_library` rules, and dynamically attaches actions to generate C++ code
42using the Protobuf compiler and compile using the C++ compiler.
43
44In order to support multiple build systems, the overall build structure is
45defined once for each system, and expose frequently-changing metadata
46from Bazel in a way that can be included from the build definition. Primarily,
47this means exposing the list of source files in a way that can be included
48in other build definitions.
49
50Two aspects are used to extract this information from the Bazel build
51definitions:
52
53*   `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build
54    rules like `cc_library`. The sources are exposed through a provider named
55    `CcFileList`.
56*   `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and
57    also generates the expected filenames that would be generated by the
58    Protobuf compiler. This information is exposed through a provider named
59    `ProtoFileList`.
60
61On their own, these aspects have limited utility. However, they can be
62instantiated by custom rules, so that an ordinary `BUILD.bazel` target can
63produce outputs based on the information gleaned from these aspects.
64
65### (Aside) Distribution libraries
66
67Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so
68that, for example, lightweight unit tests can be written with narrow scope.
69Although Bazel does build library artifacts (such as `.so` and `.a` files on
70Linux), they correspond to `cc_library` rules.
71
72Since the entire "Protobuf library" includes many constituent `cc_library`
73rules, a special rule, `cc_dist_library`, combines several fine-grained
74libraries into a single, monolithic library.
75
76For the Protobuf project, these "distribution libraries" are intended to match
77the granularity of the CMake-based builds. Since the Bazel-built
78distribution library covers the rules with the source files needed by other
79builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its
80input libraries. The result is that a `cc_dist_library` rule not only produces
81composite library artifacts, but also collect and provide the list of sources
82that were inputs.
83
84For example:
85
86```
87$ cat cc_dist_library_example/BUILD.bazel
88load("@rules_cc//cc:defs.bzl", "cc_library")
89load("//pkg:cc_dist_library.bzl", "cc_dist_library")
90
91cc_library(
92    name = "a",
93    srcs = ["a.cc"],
94)
95
96cc_library(
97    name = "b",
98    srcs = ["b.cc"],
99    deps = [":c"],
100)
101
102# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b':
103cc_library(
104    name = "c",
105    srcs = ["c.cc"],
106)
107
108cc_dist_library(
109    name = "lib",
110    deps = [
111        ":a",
112        ":b",
113    ],
114    visibility = ["//visibility:public"],
115)
116
117# Note: the output below has been formatted for clarity:
118$ bazel cquery //cc_dist_library_example:lib \
119    --output=starlark \
120    --starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]'
121struct(
122    hdrs = depset([]),
123    internal_hdrs = depset([]),
124    srcs = depset([
125        <source file cc_dist_library_example/a.cc>,
126        <source file cc_dist_library_example/b.cc>,
127    ]),
128    textual_hdrs = depset([]),
129)
130```
131
132The upshot is that the "coarse-grained" library can be defined by the Bazel
133build, and then export the list of source files that are needed to reproduce the
134library in a different build system.
135
136One major difference from most Bazel rule types is that the file list aspects do
137not propagate. In other words, they only expose the immediate dependency's
138sources, not transitive sources. This is for two reasons:
139
1401.  Immediate dependencies are conceptually simple, while transitivity requires
141    substantially more thought. For example, if transitive dependencies were
142    considered, then some way would be needed to exclude dependencies that
143    should not be part of the final library (for example, a distribution library
144    for `//:protobuf` could be defined not to include all of
145    `//:protobuf_lite`). While dependency elision is an interesting design
146    problem, the protobuf library is small enough that directly listing
147    dependencies should not be problematic.
1482.  Dealing only with immediate dependencies gives finer-grained control over
149    what goes into the composite library. For example, a Starlark `select()`
150    could conditionally add fine-grained libraries to some builds, but not
151    others.
152
153Another subtlety for tests is due to Bazel internals. Internally, a slightly
154different configuration is used when evaluating `cc_test` rules as compared to
155`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library`
156rule, and both are evaluated by Bazel, this can result in a build-time error:
157the config used for the test contains additional options that tell Bazel how to
158execute the test that the `cc_file_list_aspect` build config does not. Bazel
159detects this as two conflicting actions generating the same outputs. (For
160`cc_test` rules, the simplest workaround is to provide sources through a
161`filegroup` or similar.)
162
163### File list generation
164
165Lists of input files are generated by Bazel in a format that can be imported to
166other build systems. Currently only CMake-style files can be generated.
167
168The lists of files are derived from Bazel build targets. The sources can be:
169*   `cc_dist_library` rules (as described above)
170*   `proto_library` rules
171*   individual files
172*   `filegroup` rules
173*   `pkg_files` or `pkg_filegroup` rules from
174    https://github.com/bazelbuild/rules_pkg
175
176For example:
177
178```
179$ cat gen_file_lists_example/BUILD.bazel
180load("@protobuf//bazel:proto_library.bzl", "proto_library")
181load("//pkg:build_systems.bzl", "gen_cmake_file_lists")
182
183filegroup(
184    name = "doc_files",
185    srcs = [
186        "README.md",
187        "englilsh_paper.md",
188    ],
189)
190
191proto_library(
192    name = "message",
193    srcs = ["message.proto"],
194)
195
196gen_cmake_file_lists(
197    name = "source_lists",
198    out = "source_lists.cmake",
199    src_libs = {
200        ":doc_files": "docs",
201        ":message": "buff",
202        "//cc_dist_library_example:c": "distlib",
203    },
204)
205
206$ bazel build gen_file_lists_example:source_lists
207$ cat bazel-bin/gen_file_lists_example/source_lists.cmake
208# Auto-generated by //gen_file_lists_example:source_lists
209#
210# This file contains lists of sources based on Bazel rules. It should
211# be included from a hand-written CMake file that defines targets.
212#
213# Changes to this file will be overwritten based on Bazel definitions.
214
215if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10)
216  include_guard()
217endif()
218
219# //gen_file_lists_example:doc_files
220set(docs_files
221  gen_file_lists_example/README.md
222  gen_file_lists_example/englilsh_paper.md
223)
224
225# //gen_file_lists_example:message
226set(buff_proto_srcs
227  gen_file_lists_example/message.proto
228)
229
230# //gen_file_lists_example:message
231set(buff_srcs
232  gen_file_lists_example/message.proto.pb.cc
233)
234
235# //gen_file_lists_example:message
236set(buff_hdrs
237  gen_file_lists_example/message.proto.pb.h
238)
239
240# //gen_file_lists_example:message
241set(buff_files
242  gen_file_lists_example/message-descriptor-set.proto.bin
243)
244
245# //cc_dist_library_example:c
246set(distlib_srcs
247  cc_dist_library_example/a.cc
248  cc_dist_library_example/b.cc
249)
250
251# //cc_dist_library_example:c
252set(distlib_hdrs
253
254)
255```
256
257A hand-written CMake build rule could then use the generated file to define
258libraries, such as:
259
260```
261include(source_lists.cmake)
262add_library(distlib ${distlib_srcs} ${buff_srcs})
263```
264
265### Protobuf usage
266
267The main C++ runtimes (lite and full) and the Protobuf compiler use their
268corresponding `cc_dist_library` rules to generate file lists. For
269`proto_library` targets, the file list generation can extract the source files
270directly. For other targets, notably `cc_test` targets, the file list generators
271use `filegroup` rules.
272
273In general, adding new targets to a non-Bazel build system in Protobuf (or
274adding a new build system altogether) requires some one-time setup:
275
2761.  The overall structure of the new build system has to be defined. It should
277    import lists of files and refer to them by variable, instead of listing
278    files directly.
2792.  (Only if the build system is new) A new rule type has to be added to
280    `//pkg:build_systems.bzl`. Most of the implementation is shared, but a
281    "fragment generator" is need to declare a file list variable, and the rule
282    type itself has to be defined and call the shared implementation.
283
284When files are added or deleted, or when the Protobuf Bazel structure is
285changed, these changes may need to be reflected in the file list logic. These
286are some example scenarios:
287
288*   Files are added to (or removed from) the `srcs` of an existing `cc_library`:
289    no changes needed. If the `cc_library` is already part of a
290    `cc_dist_library`, then regenerating the source lists will reflect the
291    change.
292*   A `cc_library` is added: the new target may need to be added to the Protobuf
293    `cc_dist_library` targets, as appropriate.
294*   A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted
295    target, then a build-time error will result. The library needs to be removed
296    from the `cc_dist_library`.
297*   A `cc_test` is added or deleted: test sources are handled by `filegroup`
298    rules defined in the same package as the `cc_test` rule. The `filegroup`s
299    are usually given a name like `"test_srcs"`, and often use `glob()` to find
300    sources. This means that adding or removing a test may not require any extra
301    work, but this can be verified within the same package as the test rule.
302*   Test-only proto files are added: the `proto_library` might need to be added
303    to the file list map in `//pkg:BUILD.bazel`, and then the file added to
304    various build systems. However, most test-only protos are already exposed
305    through libraries like `//src/google/protobuf:test_protos`.
306
307If there are changes, then the regenerated file lists need to be copied back
308into the repo. That way, the corresponding build systems can be used with a git
309checkout, without needing to run Bazel first.
310
311### (Aside) Distribution archives
312
313A very similar set of rules is defined in `//pkg` to build source distribution
314archives for releases. In addition to the full sources, Protobuf releases also
315include source archives sliced by language, so that, for example, a Ruby-based
316project can get just the sources needed to build the Ruby runtime. (The
317per-language slices also include sources needed to build the protobuf compiler,
318so they all effectively include the C++ runtime.)
319
320These archives are defined using rules from the
321[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are
322similar to `cc_dist_library` and the file list generation rules, the goals are
323different: the build system file lists described above only apply to C++, and
324are organized according to what should or should not be included in different
325parts of the build (e.g., no tests are included in the main library). On the
326other hand, the distribution archives deal with languages other than C++, and
327contain all the files that need to be distributed as part of a release (even for
328C++, this is more than just the C++ sources).
329
330While it might be possible to use information from the `CcFileList` and
331`ProtoFileList` providers to define the distribution files, additional files
332(such as the various `BUILD.bazel` files) are also needed in the distribution
333archive. The lists of distribution files can usually be generated by `glob()`,
334anyhow, so sharing logic with the file list aspects may not be beneficial.
335
336Currently, all of the file lists are checked in. However, it would be possible
337to build the file lists on-the-fly and include them in the distribution
338archives, rather than checking them in.
339