1 2# Starlark in Go: Implementation 3 4This document (a work in progress) describes some of the design 5choices of the Go implementation of Starlark. 6 7 * [Scanner](#scanner) 8 * [Parser](#parser) 9 * [Resolver](#resolver) 10 * [Evaluator](#evaluator) 11 * [Data types](#data-types) 12 * [Freezing](#freezing) 13 * [Testing](#testing) 14 15 16## Scanner 17 18The scanner is derived from Russ Cox's 19[buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier) 20tool, which pretty-prints Bazel BUILD files. 21 22Most of the work happens in `(*scanner).nextToken`. 23 24## Parser 25 26The parser is hand-written recursive-descent parser. It uses the 27technique of [precedence 28climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing) 29to reduce the number of productions. 30 31In some places the parser accepts a larger set of programs than are 32strictly valid, leaving the task of rejecting them to the subsequent 33resolver pass. For example, in the function call `f(a, b=c)` the 34parser accepts any expression for `a` and `b`, even though `b` may 35legally be only an identifier. For the parser to distinguish these 36cases would require additional lookahead. 37 38## Resolver 39 40The resolver reports structural errors in the program, such as the use 41of `break` and `continue` outside of a loop. 42 43Starlark has stricter syntactic limitations than Python. For example, 44it does not permit `for` loops or `if` statements at top level, nor 45does it permit global variables to be bound more than once. 46These limitations come from the Bazel project's desire to make it easy 47to identify the sole statement that defines each global, permitting 48accurate cross-reference documentation. 49 50In addition, the resolver validates all variable names, classifying 51them as references to universal, global, local, or free variables. 52Local and free variables are mapped to a small integer, allowing the 53evaluator to use an efficient (flat) representation for the 54environment. 55 56Not all features of the Go implementation are "standard" (that is, 57supported by Bazel's Java implementation), at least for now, so 58non-standard features such as `lambda`, `float`, and `set` 59are flag-controlled. The resolver reports 60any uses of dialect features that have not been enabled. 61 62 63## Evaluator 64 65### Data types 66 67<b>Integers:</b> Integers are representing using `big.Int`, an 68arbitrary precision integer. This representation was chosen because, 69for many applications, Starlark must be able to handle without loss 70protocol buffer values containing signed and unsigned 64-bit integers, 71which requires 65 bits of precision. 72 73Small integers (<256) are preallocated, but all other values require 74memory allocation. Integer performance is relatively poor, but it 75matters little for Bazel-like workloads which depend much 76more on lists of strings than on integers. (Recall that a typical loop 77over a list in Starlark does not materialize the loop index as an `int`.) 78 79An optimization worth trying would be to represent integers using 80either an `int32` or `big.Int`, with the `big.Int` used only when 81`int32` does not suffice. Using `int32`, not `int64`, for "small" 82numbers would make it easier to detect overflow from operations like 83`int32 * int32`, which would trigger the use of `big.Int`. 84 85<b>Floating point</b>: 86Floating point numbers are represented using Go's `float64`. 87Again, `float` support is required to support protocol buffers. The 88existence of floating-point NaN and its infamous comparison behavior 89(`NaN != NaN`) had many ramifications for the API, since we cannot 90assume the result of an ordered comparison is either less than, 91greater than, or equal: it may also fail. 92 93<b>Strings</b>: 94 95TODO: discuss UTF-8 and string.bytes method. 96 97<b>Dictionaries and sets</b>: 98Starlark dictionaries have predictable iteration order. 99Furthermore, many Starlark values are hashable in Starlark even though 100the Go values that represent them are not hashable in Go: big 101integers, for example. 102Consequently, we cannot use Go maps to implement Starlark's dictionary. 103 104We use a simple hash table whose buckets are linked lists, each 105element of which holds up to 8 key/value pairs. In a well-distributed 106table the list should rarely exceed length 1. In addition, each 107key/value item is part of doubly-linked list that maintains the 108insertion order of the elements for iteration. 109 110<b>Struct:</b> 111The `starlarkstruct` Go package provides a non-standard Starlark 112extension data type, `struct`, that maps field identifiers to 113arbitrary values. Fields are accessed using dot notation: `y = s.f`. 114This data type is extensively used in Bazel, but its specification is 115currently evolving. 116 117Starlark has no `class` mechanism, nor equivalent of Python's 118`namedtuple`, though it is likely that future versions will support 119some way to define a record data type of several fields, with a 120representation more efficient than a hash table. 121 122 123### Freezing 124 125All mutable values created during module initialization are _frozen_ 126upon its completion. It is this property that permits a Starlark module 127to be referenced by two Starlark threads running concurrently (such as 128the initialization threads of two other modules) without the 129possibility of a data race. 130 131The Go implementation supports freezing by storing an additional 132"frozen" Boolean variable in each mutable object. Once this flag is set, 133all subsequent attempts at mutation fail. Every value defines a 134Freeze method that sets its own frozen flag if not already set, and 135calls Freeze for each value that it contains. 136For example, when a list is frozen, it freezes each of its elements; 137when a dictionary is frozen, it freezes each of its keys and values; 138and when a function value is frozen, it freezes each of the free 139variables and parameter default values implicitly referenced by its closure. 140Application-defined types must also follow this discipline. 141 142The freeze mechanism in the Go implementation is finer grained than in 143the Java implementation: in effect, the latter has one "frozen" flag 144per module, and every value holds a reference to the frozen flag of 145its module. This makes setting the frozen flag more efficient---a 146simple bit flip, no need to traverse the object graph---but coarser 147grained. Also, it complicates the API slightly because to construct a 148list, say, requires a reference to the frozen flag it should use. 149 150The Go implementation would also permit the freeze operation to be 151exposed to the program, for example as a built-in function. 152This has proven valuable in writing tests of the freeze mechanism 153itself, but is otherwise mostly a curiosity. 154 155 156### Fail-fast iterators 157 158In some languages (such as Go), a program may mutate a data structure 159while iterating over it; for example, a range loop over a map may 160delete map elements. In other languages (such as Java), iterators do 161extra bookkeeping so that modification of the underlying collection 162invalidates the iterator, and the next attempt to use it fails. 163This often helps to detect subtle mistakes. 164 165Starlark takes this a step further. Instead of mutation of the 166collection invalidating the iterator, the act of iterating makes the 167collection temporarily immutable, so that an attempt to, say, delete a 168dict element while looping over the dict, will fail. The error is 169reported against the delete operation, not the iteration. 170 171This is implemented by having each mutable iterable value record a 172counter of active iterators. Starting a loop increments this counter, 173and completing a loop decrements it. A collection with a nonzero 174counter behaves as if frozen. If the collection is actually frozen, 175the counter bookkeeping is unnecessary. (Consequently, iterator 176bookkeeping is needed only while objects are still mutable, before 177they can have been published to another thread, and thus no 178synchronization is necessary.) 179 180A consequence of this design is that in the Go API, it is imperative 181to call `Done` on each iterator once it is no longer needed. 182 183``` 184TODO 185starlark.Value interface and subinterfaces 186argument passing to builtins: UnpackArgs, UnpackPositionalArgs. 187``` 188 189<b>Evaluation strategy:</b> 190The evaluator uses a simple recursive tree walk, returning a value or 191an error for each expression. We have experimented with just-in-time 192compilation of syntax trees to bytecode, but two limitations in the 193current Go compiler prevent this strategy from outperforming the 194tree-walking evaluator. 195 196First, the Go compiler does not generate a "computed goto" for a 197switch statement ([Go issue 1985496](https://github.com/golang/go/issues/5496)). A bytecode 199interpreter's main loop is a for-loop around a switch statement with 200dozens or hundreds of cases, and the speed with which each case can be 201dispatched strongly affects overall performance. 202Currently, a switch statement generates a binary tree of ordered 203comparisons, requiring several branches instead of one. 204 205Second, the Go compiler's escape analysis assumes that the underlying 206array from a `make([]Value, n)` allocation always escapes 207([Go issue 20533](https://github.com/golang/go/issues/20533)). 208Because the bytecode interpreter's operand stack has a non-constant 209length, it must be allocated with `make`. The resulting allocation 210adds to the cost of each Starlark function call; this can be tolerated 211by amortizing one very large stack allocation across many calls. 212More problematic appears to be the cost of the additional GC write 213barriers incurred by every VM operation: every intermediate result is 214saved to the VM's operand stack, which is on the heap. 215By contrast, intermediate results in the tree-walking evaluator are 216never stored to the heap. 217 218``` 219TODO 220frames, backtraces, errors. 221threads 222Print 223Load 224``` 225 226## Testing 227 228``` 229TODO 230starlarktest package 231`assert` module 232starlarkstruct 233integration with Go testing.T 234``` 235 236 237## TODO 238 239 240``` 241Discuss practical separation of code and data. 242``` 243