• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2title: Transform Fallback
3---
4
5# Transform Fallback
6
7We need to more clearly describe the presumed lookup fallback for transforms:
8
9## Code equivalence
10
11- A lone script code or long script name is equivalent to the BCP 47 syntax: Latn = Latin = und-Latn.
12- "und" from BCP 47 is treated the same as the special code "any" in transform IDs
13- In the unlikely event that we have a collision between a special transform code (any, hex, fullwidth, etc) and a BCP 47 language code, we have to figure out what to do. Initial suggestion: add "\_ZZ" to language code.
14- For the special codes, we should probably switch to aliases that have a low probability of collision, eg > 3 letters always.
15
16## Language tag fallback
17
18If the source or target is a Unicode language ID, then a fallback is followed, with some additions.
19
201. az\_Arab\_IR
212. az\_Arab
223. az\_IR
234. az
245. Arab
256. Cyrl
26
27The fallback additions are:
28
29- We fallback also through the country (03). This is along the lines we've otherwise discussed for BCP47 support, and that we should clarify in the spec.
30- Once the language is reached, we fall back to script; first the specified script if there is one (05), then the likely script for lang (06 - if different than 05)
31
32## Laddered fallback
33
34The source, target, and varient use "laddered" fallback. That is, in pseudo code:
35
36a. for variant in variant-chain
37
38b. for target in target-chain
39
40c. for source in source-chain
41
42 transform = lookup source-target/variant
43
44 if transform != null return transform
45
46..
47
48For example, here is the chain for ru\_RU-el\_GR/BGN. I'm spacing out the source, target, and variant for clarity.
49
501. ru\_RU - el\_GR /BGN
512. ru - el\_GR /BGN
523. Cyrl - el\_GR /BGN
534. ru\_RU - el /BGN
545. ru - el /BGN
556. Cyrl - el /BGN
567. ru\_RU - Grek /BGN
578. ru - Grek /BGN
589. Cyrl - Grek /BGN
5910. ru\_RU - el\_GR
6011. ru - el\_GR
6112. Cyrl - el\_GR
6213. ru\_RU - el
6314. ru - el
6415. Cyrl - el
6516. ru\_RU - Grek
6617. ru - Grek
6718. Cyrl - Grek
68
69**Comments:**
70
711. The above is not how ICU code works. That code actually discards the variant if the exact match is not found, so lines 02-09 are not queried at all. I think that is definitely a mistake.
722. Personally, I think the above chain might not be optimal; that it would be better to have BGN be stronger than country difference, but not as strong as Script. However, in conversations with Markus, I was convinced that a simple story for how it works is probably the best, and the above is simpler to explain and easier to implement.
73
74## Model Requirements
75
76We have the implicit requirement that no variant is populated unless there is a no-variant version. We need to make sure that that is maintained by the build tools and/or tests. That is, if we have fa-Latn/BGN, we should have fa-Latn as well. The other piece of this is that we should name all the no-variant versions, so that people can be explicit about the variant even in case we change the default later on. The upshot is that the no-variant version should always just be aliases to one of the variant versions. Operationally, that means the following actions:
77
78Case 1. only fa-Latn/BGN. Add an alias from fa-Latn to fa-Latn/BGN
79
80Case 2. only foo-Latn. Rename to foo-Latn/SOMETHING, and then do Case 1.
81
82