• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2# Starlark in Go: Implementation
3
4This document (a work in progress) describes some of the design
5choices of the Go implementation of Starlark.
6
7  * [Scanner](#scanner)
8  * [Parser](#parser)
9  * [Resolver](#resolver)
10  * [Evaluator](#evaluator)
11    * [Data types](#data-types)
12    * [Freezing](#freezing)
13  * [Testing](#testing)
14
15
16## Scanner
17
18The scanner is derived from Russ Cox's
19[buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier)
20tool, which pretty-prints Bazel BUILD files.
21
22Most of the work happens in `(*scanner).nextToken`.
23
24## Parser
25
26The parser is hand-written recursive-descent parser. It uses the
27technique of [precedence
28climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing)
29to reduce the number of productions.
30
31In some places the parser accepts a larger set of programs than are
32strictly valid, leaving the task of rejecting them to the subsequent
33resolver pass. For example, in the function call `f(a, b=c)` the
34parser accepts any expression for `a` and `b`, even though `b` may
35legally be only an identifier. For the parser to distinguish these
36cases would require additional lookahead.
37
38## Resolver
39
40The resolver reports structural errors in the program, such as the use
41of `break` and `continue` outside of a loop.
42
43Starlark has stricter syntactic limitations than Python. For example,
44it does not permit `for` loops or `if` statements at top level, nor
45does it permit global variables to be bound more than once.
46These limitations come from the Bazel project's desire to make it easy
47to identify the sole statement that defines each global, permitting
48accurate cross-reference documentation.
49
50In addition, the resolver validates all variable names, classifying
51them as references to universal, global, local, or free variables.
52Local and free variables are mapped to a small integer, allowing the
53evaluator to use an efficient (flat) representation for the
54environment.
55
56Not all features of the Go implementation are "standard" (that is,
57supported by Bazel's Java implementation), at least for now, so
58non-standard features such as `lambda`, `float`, and `set`
59are flag-controlled.  The resolver reports
60any uses of dialect features that have not been enabled.
61
62
63## Evaluator
64
65### Data types
66
67<b>Integers:</b> Integers are representing using `big.Int`, an
68arbitrary precision integer. This representation was chosen because,
69for many applications, Starlark must be able to handle without loss
70protocol buffer values containing signed and unsigned 64-bit integers,
71which requires 65 bits of precision.
72
73Small integers (<256) are preallocated, but all other values require
74memory allocation. Integer performance is relatively poor, but it
75matters little for Bazel-like workloads which depend much
76more on lists of strings than on integers. (Recall that a typical loop
77over a list in Starlark does not materialize the loop index as an `int`.)
78
79An optimization worth trying would be to represent integers using
80either an `int32` or `big.Int`, with the `big.Int` used only when
81`int32` does not suffice. Using `int32`, not `int64`, for "small"
82numbers would make it easier to detect overflow from operations like
83`int32 * int32`, which would trigger the use of `big.Int`.
84
85<b>Floating point</b>:
86Floating point numbers are represented using Go's `float64`.
87Again, `float` support is required to support protocol buffers. The
88existence of floating-point NaN and its infamous comparison behavior
89(`NaN != NaN`) had many ramifications for the API, since we cannot
90assume the result of an ordered comparison is either less than,
91greater than, or equal: it may also fail.
92
93<b>Strings</b>:
94
95TODO: discuss UTF-8 and string.bytes method.
96
97<b>Dictionaries and sets</b>:
98Starlark dictionaries have predictable iteration order.
99Furthermore, many Starlark values are hashable in Starlark even though
100the Go values that represent them are not hashable in Go: big
101integers, for example.
102Consequently, we cannot use Go maps to implement Starlark's dictionary.
103
104We use a simple hash table whose buckets are linked lists, each
105element of which holds up to 8 key/value pairs. In a well-distributed
106table the list should rarely exceed length 1. In addition, each
107key/value item is part of doubly-linked list that maintains the
108insertion order of the elements for iteration.
109
110<b>Struct:</b>
111The `starlarkstruct` Go package provides a non-standard Starlark
112extension data type, `struct`, that maps field identifiers to
113arbitrary values. Fields are accessed using dot notation: `y = s.f`.
114This data type is extensively used in Bazel, but its specification is
115currently evolving.
116
117Starlark has no `class` mechanism, nor equivalent of Python's
118`namedtuple`, though it is likely that future versions will support
119some way to define a record data type of several fields, with a
120representation more efficient than a hash table.
121
122
123### Freezing
124
125All mutable values created during module initialization are _frozen_
126upon its completion. It is this property that permits a Starlark module
127to be referenced by two Starlark threads running concurrently (such as
128the initialization threads of two other modules) without the
129possibility of a data race.
130
131The Go implementation supports freezing by storing an additional
132"frozen" Boolean variable in each mutable object. Once this flag is set,
133all subsequent attempts at mutation fail. Every value defines a
134Freeze method that sets its own frozen flag if not already set, and
135calls Freeze for each value that it contains.
136For example, when a list is frozen, it freezes each of its elements;
137when a dictionary is frozen, it freezes each of its keys and values;
138and when a function value is frozen, it freezes each of the free
139variables and parameter default values implicitly referenced by its closure.
140Application-defined types must also follow this discipline.
141
142The freeze mechanism in the Go implementation is finer grained than in
143the Java implementation: in effect, the latter has one "frozen" flag
144per module, and every value holds a reference to the frozen flag of
145its module. This makes setting the frozen flag more efficient---a
146simple bit flip, no need to traverse the object graph---but coarser
147grained. Also, it complicates the API slightly because to construct a
148list, say, requires a reference to the frozen flag it should use.
149
150The Go implementation would also permit the freeze operation to be
151exposed to the program, for example as a built-in function.
152This has proven valuable in writing tests of the freeze mechanism
153itself, but is otherwise mostly a curiosity.
154
155
156### Fail-fast iterators
157
158In some languages (such as Go), a program may mutate a data structure
159while iterating over it; for example, a range loop over a map may
160delete map elements. In other languages (such as Java), iterators do
161extra bookkeeping so that modification of the underlying collection
162invalidates the iterator, and the next attempt to use it fails.
163This often helps to detect subtle mistakes.
164
165Starlark takes this a step further. Instead of mutation of the
166collection invalidating the iterator, the act of iterating makes the
167collection temporarily immutable, so that an attempt to, say, delete a
168dict element while looping over the dict, will fail. The error is
169reported against the delete operation, not the iteration.
170
171This is implemented by having each mutable iterable value record a
172counter of active iterators. Starting a loop increments this counter,
173and completing a loop decrements it. A collection with a nonzero
174counter behaves as if frozen. If the collection is actually frozen,
175the counter bookkeeping is unnecessary. (Consequently, iterator
176bookkeeping is needed only while objects are still mutable, before
177they can have been published to another thread, and thus no
178synchronization is necessary.)
179
180A consequence of this design is that in the Go API, it is imperative
181to call `Done` on each iterator once it is no longer needed.
182
183```
184TODO
185starlark.Value interface and subinterfaces
186argument passing to builtins: UnpackArgs, UnpackPositionalArgs.
187```
188
189<b>Evaluation strategy:</b>
190The evaluator uses a simple recursive tree walk, returning a value or
191an error for each expression. We have experimented with just-in-time
192compilation of syntax trees to bytecode, but two limitations in the
193current Go compiler prevent this strategy from outperforming the
194tree-walking evaluator.
195
196First, the Go compiler does not generate a "computed goto" for a
197switch statement ([Go issue
1985496](https://github.com/golang/go/issues/5496)). A bytecode
199interpreter's main loop is a for-loop around a switch statement with
200dozens or hundreds of cases, and the speed with which each case can be
201dispatched strongly affects overall performance.
202Currently, a switch statement generates a binary tree of ordered
203comparisons, requiring several branches instead of one.
204
205Second, the Go compiler's escape analysis assumes that the underlying
206array from a `make([]Value, n)` allocation always escapes
207([Go issue 20533](https://github.com/golang/go/issues/20533)).
208Because the bytecode interpreter's operand stack has a non-constant
209length, it must be allocated with `make`. The resulting allocation
210adds to the cost of each Starlark function call; this can be tolerated
211by amortizing one very large stack allocation across many calls.
212More problematic appears to be the cost of the additional GC write
213barriers incurred by every VM operation: every intermediate result is
214saved to the VM's operand stack, which is on the heap.
215By contrast, intermediate results in the tree-walking evaluator are
216never stored to the heap.
217
218```
219TODO
220frames, backtraces, errors.
221threads
222Print
223Load
224```
225
226## Testing
227
228```
229TODO
230starlarktest package
231`assert` module
232starlarkstruct
233integration with Go testing.T
234```
235
236
237## TODO
238
239
240```
241Discuss practical separation of code and data.
242```
243