• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1This license collection script is, fundamentally, one giant pile of
2special cases. As such, while there is an attempt to model the rules
3that apply to licenses and apply some sort of order to the process,
4the code is less than clear. This file attempts to provide an overview.
5
6main.dart is the core of the operation. It first walks the entire
7directory tree starting from the root of the repository (which is to
8be specified on the command line as the only argument), creating an
9in-memory representation of the project (make sure to run this only
10after you've run gclient sync, so that all dependencies are on disk).
11This is the step that is labeled "Preparing data structures".
12
13Then, it walks this in-memory representation, attempting to assign to
14each file one or more licenses. This is the step labeled "Collecting
15licenses", which takes a long time.
16
17Finally, it prints out these licenses.
18
19The in-memory representation is a tree of RepositoryEntry objects.
20There's three important types of these objects: RepositoryDirectory
21objects, which represent directories; RepositoryLicensedFile, which
22represents source files and resources that might end up in the binary,
23and RepositoryLicenseFile, which represents license files that do not
24themselves end up in the binary other than as a side-effect of this
25script.
26
27RepositoryDirectory objects contain three lists, the list of
28RepositoryDirectory subdirectories, the list of RepositoryLicensedFile
29children, and the list of RepositoryLicenseFile children.
30
31RepositoryDirectory objects are the objects that crawl the filesystem.
32
33While the script is pretty conservative (including probably more
34licenses than strictly necessary), it tries to avoid including
35material that isn't actually used. To do this, RepositoryDirectory
36objects only crawl directories and files for which shouldRecurse
37returns true. For example, shouldRecurse returns false for ".git"
38files.
39
40Some directories and files require special handling, and have specific
41subclasses of the above classes. To create the appropriate objects,
42RepositoryDirectory calls createSubdirectory and createFile to create
43the nodes of the tree.
44
45
46The low-level handling of files is done by classes in filesystem.dart.
47This code supports transparently crawling into archives (e.g. .jar
48files), as well as handling UTF-8 vs latin1. It contains much magic
49and hard-coded file names and so on to handle distinguishing binary
50files from text files, and so forth.
51
52This code uses the cache described in cache.dart to try to avoid
53having to repeatedly reopen the same file many times in a row.
54
55
56In the case of a binary file, the license is found by crawling around
57the directory structure looking for a "default" license file. In the
58case of text files, though, it's often the case that the file itself
59mentions the license and therefore the file itself is inspected
60looking for copyright or license text. This scanning is done by
61determineLicensesFor() in licenses.dart.
62
63This function uses patterns that are themselves in patterns.dart. In
64this file we find all manner of long complicated and somewhat crazy
65regular expressions. This is where you see quite how absurd this work
66can actually be. It is left as an exercise to the reader to look for
67the implications of many of the regular expressions; as one example,
68though, consider the case of the pattern that matches the AFL/LGPL
69dual license statement: there is one file in which the ZIP code for
70the Free Software Foundation is off by one, for no clear reason,
71leading to the pattern ending with "MA 0211[01]-1307, USA".
72
73
74The license.dart file also contains the License object, the currently
75simplistic normalizer (_reformat) for license text (which mostly just
76removes comment syntax), the code that attempts to determine what
77copyrights apply to which licenses, and the code that attempts to
78identify the licenses themselves (at a high level), to make sure that
79appropriate clauses are followed (e.g. including the copyright with a
80BSD notice).
81