• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2title: Thoughts on Survey Tool Backend
3---
4
5# Thoughts on Survey Tool Backend
6
7Here are some initial thoughts on reworking the Survey Tool backend.
8
9## The problem
10
11It is clear that the survey tool needs major performance and reliability improvements. For example, just checking now:
12
13- ~2 users, 226pg/uptime: 1:55:00/load:33%
14- Takes about 1.5 minutes to open a new locale (Croatian).
15- Takes about 2.5 minutes to open the vetting view.
16- Takes about 0.4 minutes to open a zoomed item view.
17    - Now, these are first times; subsequent zoomed views seem to be quite fast.
18
19It also places quite a load on the Unicode server, and doesn’t scale well to lots of users. And we need to reboot very often. So here are some thoughts on a possible re-architecture.
20
21## Data
22Here is roughly the data and structure we need. But this is from the outside: Steven is really the one who would know all the guts.
23
24pathId → path,
25
26path → pathId
27
28*// the path→pathId using StringId is algorithmic and immutable. Both can be stored in memory to optimize. *
29
30*// we could use PrettyPath, but the downside is that we have to update that every time we change the DTD*
31
32voter → organization, voterLevel (eg VETTER), authorizedLocales
33
34*// relatively constant data.*
35
36pathId → valueInfo+
37
38*// ordered by voteCount then UCA (so first is winning, second is ‘next best’)*
39
40 valueInfo = value, isInherited, coverageLevel, voteCount, voter*, errorStatus*, example?
41
42 *// that is, a value like “Sontag”, whether the value is inherited, what the coverage level is (computed algorithmically), what the voteCount is (computed from the voters: computed and cached), the errorStatus (computed and cached), and the example text (computed and cached). Maybe add dependentPaths* (see below).*
43
44 errorStatus = error/warningID, message
45
46value → pathId*
47
48*// used for computing display collisions*
49
50staleLocales → locale*
51
52*// used for updating the cache*
53
54## Operation
55
561. When a user votes for a value, the valuesInfos are re-sorted if necessary.
572. When a user adds a new value, all the values, including errorStatus and examples, are
58computed. The valueInfo is added to pathId→valueInf+, and the pathId is added to
59value→pathId*. (If a value is deleted, then the corresponding entries are removed.)
603. In either of these cases, if the first (winning) value changes, then the locale and its
61children are added to the staleLocales queue.
62
63## StaleLocales Queue
64
65A (logical) “LinkedHashSet” of locales (but any child locale is automatically repositioned after its parents).
66
671. A separate process walks through the queue, processing and removing locales (see Issues).
682. It walks through all of the pathIDs for a locale and recomputes the errorStatus and example.
693. If ever the locale or its parents are added to the queue while it is processing that locale, it restarts.
70
71## Issues
72
73Issue: We could precompute the dependencies on paths, which are static. Then if the winning value for path x changes, then we know just those paths that may need to change the error status and/or examples, and don’t have to walk them all. Note that changes to English or root may need error checking to be redone everywhere.
74
75Issue: We could make root and English completely frozen, since the TC is responsible for all changes to them. So the items would need to be updated with a manual data update. *Added [4005](http://unicode.org/cldr/trac/ticket/4005)*
76
77Issue: Host on http://code.google.com/appengine/docs/whatisgoogleappengine.html? Would allow for better scalability, robustness, etc. Take load off of Unicode server. *We’re investigating this at Google.*
78
79Issue: the Voter map changes occasionally. For new users, we don’t have to do anything. The only real change is if the voterStatus changes. In that case, we need to revisit all of the authorizedLocales. Because it is very infrequent, it is probably ok not try to optimize (eg not keep a back map of voter→pathId*).
80
81Issue: With multiple machines (or app engine) we could shard the processing; divide up the locales by base language, and divy them out to different machines. (Clumps would have to be slightly larger where we have sibling aliases.)
82
83