• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1## Unicode Technical Standard #35
2
3# Unicode Locale Data Markup Language (LDML)<br/>Part 7: Keyboards
4
5<!-- HTML: no th -->
6<table><tbody>
7<tr><td>Version</td><td>40</td></tr>
8<tr><td>Editors</td><td>Steven Loomis (<a href="mailto:srl@icu-project.org">srl@icu-project.org</a>) and <a href="tr35.html#Acknowledgments">other CLDR committee members</a></td></tr>
9</tbody></table>
10
11For the full header, summary, and status, see [Part 1: Core](tr35.md).
12
13#### _Important Note_
14
15> The CLDR [Keyboard Workgroup](https://cldr.unicode.org/index/keyboard-workgroup) is currently
16> developing major changes to the CLDR keyboard specification. These changes are targeted for
17> CLDR version 41. Please see [CLDR-15034](https://unicode-org.atlassian.net/browse/CLDR-15034) for
18> the latest information.
19### _Summary_
20
21This document describes parts of an XML format (_vocabulary_) for the exchange of structured locale data. This format is used in the [Unicode Common Locale Data Repository](https://unicode.org/cldr/).
22
23This is a partial document, describing keyboard mappings. For the other parts of the LDML see the [main LDML document](tr35.md) and the links above.
24
25### _Status_
26
27_This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress._
28
29> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
30
31_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](tr35.md#Bugs)]. Related information that is useful in understanding this document is found in the [References](tr35.md#References). For the latest version of the Unicode Standard see [[Unicode](tr35.md#Unicode)]. For a list of current Unicode Technical Reports see [[Reports](tr35.md#Reports)]. For more information about versions of the Unicode Standard, see [[Versions](tr35.md#Versions)]._
32
33## <a name="Parts" href="#Parts">Parts</a>
34
35The LDML specification is divided into the following parts:
36
37*   Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
38*   Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
39*   Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
40*   Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
41*   Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
42*   Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
43*   Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
44
45## <a name="Contents" href="#Contents">Contents of Part 7, Keyboards</a>
46
47*   1 [Keyboards](#Introduction)
48*   2 [Goals and Non-goals](#Goals_and_Nongoals)
49*   3 [Definitions](#Definitions)
50*   4 [File and Directory Structure](#File_and_Dir_Structure)
51*   5 [Element Hierarchy - Layout File](#Element_Heirarchy_Layout_File)
52    *   5.1 [Element: keyboard](#Element_Keyboard)
53    *   5.2 [Element: version](#Element_version)
54    *   5.3 [Element: generation](#Element_generation)
55    *   5.4 [Element: info](#Element_info)
56    *   5.5 [Element: names](#Element_names)
57    *   5.6 [Element: name](#Element_name)
58    *   5.7 [Element: settings](#Element_settings)
59    *   5.8 [Element: keyMap](#Element_keyMap)
60        *   Table: [Possible Modifier Keys](#Possible_Modifier_Keys)
61    *   5.9 [Element: map](#Element_map)
62        *   5.9.1 [Element: flicks, flick](#Element_flicks)
63    *   5.10 [Element: import](#Element_import)
64    *   5.11 [Element: displayMap](#Element_displayMap)
65    *   5.12 [Element: display](#Element_display)
66    *   5.13 [Element: layer](#Element_layer)
67    *   5.14 [Element: row](#Element_row)
68    *   5.15 [Element: switch](#Element_switch)
69    *   5.16 [Element: vkeys](#Element_vkeys)
70    *   5.17 [Element: vkey](#Element_vkey)
71    *   5.18 [Element: transforms](#Element_transforms)
72    *   5.19 [Element: transform](#Element_transform)
73    *   5.20 [Element: reorders, reorder](#Element_reorder)
74    *   5.21 [Element: transform final](#Element_final)
75    *   5.22 [Element: backspaces](#Element_backspaces)
76    *   5.23 [Element: backspace](#Element_backspace)
77*   6 [Element Hierarchy - Platform File](#Element_Heirarchy_Platform_File)
78    *   6.1 [Element: platform](#Element_platform)
79    *   6.2 [Element: hardwareMap](#Element_hardwareMap)
80    *   6.3 [Element: map](#Element_hardwareMap_map)
81*   7 [Invariants](#Invariants)
82*   8 [Data Sources](#Data_Sources)
83    *   Table: [Key Map Data Sources](#Key_Map_Data_Sources)
84*   9 [Keyboard IDs](#Keyboard_IDs)
85    *   9.1 [Principles for Keyboard Ids](#Principles_for_Keyboard_Ids)
86*   10 [Platform Behaviors in Edge Cases](#Platform_Behaviors_in_Edge_Cases)
87
88<a name="Keyboards"></a>
89## 1 <a name="Introduction" href="#Introduction">Keyboards</a>
90
91The CLDR keyboard format provides for the communication of keyboard mapping data between different modules, and the comparison of data across different vendors and platforms. The standardized identifier for keyboards can be used to communicate, internally or externally, a request for a particular keyboard mapping that is to be used to transform either text or keystrokes. The corresponding data can then be used to perform the requested actions.
92
93For example, a web-based virtual keyboard may transform text in the following way. Suppose the user types a key that produces a "W" on a qwerty keyboard. A web-based tool using an azerty virtual keyboard can map that text ("W") to the text that would have resulted from typing a key on an azerty keyboard, by transforming "W" to "Z". Such transforms are in fact performed in existing web applications.
94
95The data can also be used in analysis of the capabilities of different keyboards. It also allows better interoperability by making it easier for keyboard designers to see which characters are generally supported on keyboards for given languages.
96
97To illustrate this specification, here is an abridged layout representing the English US 101 keyboard on the Mac OSX operating system (with an inserted long-press example). For more complete examples, and information collected about keyboards, see keyboard data in XML.
98
99```xml
100<keyboard locale="en-t-k0-osx">
101    <version platform="10.4" number="$Revision: 8294 $" />
102    <names>
103        <name value="U.S." />
104    </names>
105    <keyMap>
106        <map iso="E00" to="`" />
107        <map iso="E01" to="1" />
108        <map iso="D01" to="q" />
109        <map iso="D02" to="w" />
110        <map iso="D03" to="e" longPress="é è ê ë" />
111112    </keyMap>
113    <keyMap modifiers="caps">
114        <map iso="E00" to="`" />
115        <map iso="E01" to="1" />
116        <map iso="D01" to="Q" />
117        <map iso="D02" to="W" />
118119    </keyMap>
120    <keyMap modifiers="opt">
121        <map iso="E00" to="`" />
122        <map iso="E01" to="¡" /> <!-- key=1 -->
123        <map iso="D01" to="œ" /> <!-- key=Q -->
124        <map iso="D02" to="∑" /> <!-- key=W -->
125126    </keyMap>
127    <transforms type="simple">
128        <transform from="` " to="`" />
129        <transform from="`a" to="à" />
130        <transform from="`A" to="À" />
131        <transform from="´ " to="´" />
132        <transform from="´a" to="á" />
133        <transform from="´A" to="Á" />
134        <transform from="˜ " to="˜" />
135        <transform from="˜a" to="ã" />
136        <transform from="˜A" to="Ã" />
137138    </transforms>
139</keyboard>
140```
141
142And its associated platform file (which includes the hardware mapping):
143
144```xml
145<platform id="osx">
146    <hardwareMap>
147        <map keycode="0" iso="C01" />
148        <map keycode="1" iso="C02" />
149        <map keycode="6" iso="B01" />
150        <map keycode="7" iso="B02" />
151        <map keycode="12" iso="D01" />
152        <map keycode="13" iso="D02" />
153        <map keycode="18" iso="E01" />
154        <map keycode="50" iso="E00" />
155    </hardwareMap>
156</platform>
157```
158
159* * *
160
161## 2 <a name="Goals_and_Nongoals" href="#Goals_and_Nongoals">Goals and Non-goals</a>
162
163Some goals of this format are:
164
1651. Make the XML as readable as possible.
1662. Represent faithfully keyboard data from major platforms: it should be possible to create a functionally-equivalent data file (such that given any input, it can produce the same output).
1673. Make as much commonality in the data across platforms as possible to make comparison easy.
168
169Some non-goals (outside the scope of the format) currently are:
170
1711. Display names or symbols for keycaps (eg, the German name for "Return"). If that were added to LDML, it would be in a different structure, outside the scope of this section.
1722. Advanced IME features, handwriting recognition, etc.
1733. Roundtrip mappings—the ability to recover precisely the same format as an original platform's representation. In particular, the internal structure may have no relation to the internal structure of external keyboard source data, the only goal is functional equivalence.
174
175Note: During development of this section, it was considered whether the modifier RAlt (=AltGr) should be merged with Option. In the end, they were kept separate, but for comparison across platforms implementers may choose to unify them.
176
177Note that in parts of this document, the format `@x` is used to indicate the _attribute_ **x**.
178
179* * *
180
181## 3 <a name="Definitions" href="#Definitions">Definitions</a>
182
183**Arrangement** is the term used to describe the relative position of the rectangles that represent keys, either physically or virtually. A physical keyboard has a static arrangement while a virtual keyboard may have a dynamic arrangement that changes per language and/or layer. While the arrangement of keys on a keyboard may be fixed, the mapping of those keys may vary.
184
185**Base character:** The character emitted by a particular key when no modifiers are active. In ISO terms, this is group 1, level 1.
186
187**Base map:** A mapping from the ISO positions to the base characters. There is only one base map per layout. The characters on this map can be output by not using any modifier keys.
188
189**Core keyboard layout:** also known as “alpha” block. The primary set of key values on a keyboard that are used for typing the target language of the keyboard. For example, the three rows of letters on a standard US QWERTY keyboard (QWERTYUIOP, ASDFGHJKL, ZXCVBNM) together with the most significant punctuation keys. Usually this equates to the minimal keyset for a language as seen on mobile phone keyboards.
190
191**Hardware map:** A mapping between key codes and ISO layout positions.
192
193**Input Method Editor (IME):** a component or program that supports input of large character sets. Typically, IMEs employ contextual logic and candidate UI to identify the Unicode characters intended by the user.
194
195**ISO position:** The corresponding position of a key using the ISO layout convention where rows are identified by letters and columns are identified by numbers. For example, "D01" corresponds to the "Q" key on a US keyboard. For the purposes of this document, an ISO layout position is depicted by a one-letter row identifier followed by a two digit column number (like "B03", "E12" or "C00"). The following diagram depicts a typical US keyboard layout superimposed with the ISO layout indicators (it is important to note that the number of keys and their physical placement relative to each-other in this diagram is irrelevant, rather what is important is their logical placement using the ISO convention):
196
197![keyboard layout example showing ISO key numbering](images/keyPositions.png)
198
199One may also extend the notion of the ISO layout to support keys that don't map directly to the diagram above (such as the Android device - see diagram). Per the ISO standard, the space bar is mapped to "A03", so the period and comma keys are mapped to "A02" and "A04" respectively based on their relative position to the space bar. Also note that the "E" row does not exist on the Android keyboard.
200
201![keyboard layout example showing extension of ISO key numbering](images/androidKeyboard.png)
202
203If it becomes necessary in the future, the format could extend the ISO layout to support keys that are located to the left of the "00" column by using negative column numbers "-01", "-02" and so on, or 100's complement "99", "98",...
204
205**Key:** A key on a physical keyboard.
206
207**Key code:** The integer code sent to the application on pressing a key.
208
209**Key map:** The basic mapping between ISO positions and the output characters for each set of modifier combinations associated with a particular layout. There may be multiple key maps for each layout.
210
211**Keyboard:** The physical keyboard.
212
213**Keyboard layout:** A layout is the overall keyboard configuration for a particular locale. Within a keyboard layout, there is a single base map, one or more key maps and zero or more transforms.
214
215**Layer** is an arrangement of keys on a virtual keyboard. Since it is often not intended to use two hands on a visual keyboard to allow the pressing of modifier keys. Modifier keys are made sticky in that one presses one, the visual representation, and even arrangement, of the keys change, and you press the key. This visual representation is a layer. Thus a virtual keyboard is made up of a set of layers.
216
217**Long-press key:** also known as a “child key”. A secondary key that is invoked from a top level key on a software keyboard. Secondary keys typically provide access to variants of the top level key, such as accented variants (a => á, à, ä, ã)
218
219**Modifier:** A key that is held to change the behavior of a keyboard. For example, the "Shift" key allows access to upper-case characters on a US keyboard. Other modifier keys include but is not limited to: Ctrl, Alt, Option, Command and Caps Lock.
220
221**Physical keyboard** is a keyboard that has individual keys that are pressed. Each key has a unique identifier and the arrangement doesn't change, even if the mapping of those keys does.
222
223**Transform:** A transform is an element that specifies a set of conversions from sequences of code points into one (or more) other code points. For example, in most latin keyboards hitting the "^" dead-key followed by the "e" key produces "ê".
224
225**Virtual keyboard** is a keyboard that is rendered on a, typically, touch surface. It has a dynamic arrangement and contrasts with a physical keyboard. This term has many synonyms: touch keyboard, software keyboard, SIP (Software Input Panel). This contrasts with other uses of the term virtual keyboard as an on-screen keyboard for reference or accessibility data entry.
226
227### 3.1 <a name="Escaping" href="#Escaping">Escaping</a>
228
229When explicitly specified, attributes can contain escaped characters. This specification uses two methods of escaping, the _UnicodeSet_ notation and the `\u{...}` notation.
230
231The _UnicodeSet_ notation is described in [UTS#35 section 5.3.3](tr35.md#Unicode_Sets) and allows for comprehensive character matching, including by character range, properties, names, or codepoints. Currently, the following attributes allow _UnicodeSet_ notation:
232
233* `from`, `before`, `after` on the `<transform>` element
234* `from`, `before`, `after` on the `<reorder>` element
235* `from`, `before`, `after` on the `<backspace>` element
236
237The `\u{...}` notation, a subset of hex notation, is described in [UTS#18 section 1.1](http://www.unicode.org/reports/tr18/#Hex_notation). It can refer to one or multiple individual codepoints. Currently, the following attributes allow the `\u{...}` notation:
238
239* `to`, `longPress`, `multitap`, `hint` on the `<map>` element
240* `to` on the `<transform>` element
241* `to` on the `<backspace>` element
242
243Characters of general category of Combining Mark (M), Control characters (Cc), Format characters (Cf), and whitespace other than space should be encoded using one of the notation above as appropriate.
244
245* * *
246
247## 4 <a name="File_and_Dir_Structure" href="#File_and_Dir_Structure">File and Directory Structure</a>
248
249Each platform has its own directory, where a "platform" is a designation for a set of keyboards available from a particular source, such as Windows or ChromeOS. This directory name is the platform name (see Table 2 located further in the document). Within this directory there are two types of files:
250
2511. A single platform file (see XML structure for Platform file), this file includes a mapping of hardware key codes to the ISO layout positions. This file is also open to expansion for any configuration elements that are valid across the whole platform and that are not layout specific. This file is simply called `_platform.xml`.
2522. Multiple layout files named by their locale identifiers. (eg. `lt-t-k0-chromeos.xml` or `ne-t-k0-windows.xml`).
253
254Keyboard data that is not supported on a given platform, but intended for use with that platform, may be added to the directory `/und/`. For example, there could be a file `/und/lt-t-k0-chromeos.xml`, where the data is intended for use with ChromeOS, but does not reflect data that is distributed as part of a standard ChromeOS release.
255
256* * *
257
258## 5 <a name="Element_Heirarchy_Layout_File" href="#Element_Heirarchy_Layout_File">Element Hierarchy - Layout File</a>
259
260### 5.1 <a name="Element_Keyboard" href="#Element_Keyboard">Element: keyboard</a>
261
262This is the top level element. All other elements defined below are under this element.
263
264**Syntax**
265
266```xml
267<keyboard locale="{locale ID}">
268    {definition of the layout as described by the elements defined below}
269</keyboard>
270```
271
272> <small>
273>
274> Parents: _none_
275> Children: [version](#Element_version), [~~generation~~](#Element_generation), [info](#Element_info), [names](#Element_names), [settings](#Element_settings), [import](#Element_import), [keyMap](#Element_KeyMap), [displayMap](#Element_DisplayMap), [layer](#Element_layer), [vkeys](#Element_vkeys), [transforms](#Element_transforms), [reorders](#Element_reorder), [backspaces](#Element_backspaces)
276> Occurence: required, single
277>
278> </small>
279
280_Attribute:_ `locale` (required)
281
282This mandatory attribute represents the locale of the keyboard using Unicode locale identifiers (see [LDML](tr35.md)) - for example `"el"` for Greek. Sometimes, the locale may not specify the base language. For example, a Devanagari keyboard for many languages could be specified by BCP-47 code: `"und-Deva"`. For details, see [Keyboard IDs](#Keyboard_IDs) .
283
284**Example** (for illustrative purposes only, not indicative of the real data)
285
286```xml
287<keyboard locale="ka-t-k0-qwerty-windows">
288289</keyboard>
290```
291```xml
292<keyboard locale="fr-CH-t-k0-android">
293294</keyboard>
295```
296
297* * *
298
299### 5.2 <a name="Element_version" href="#Element_version">Element: version</a>
300
301Element used to keep track of the source data version.
302
303**Syntax**
304
305```xml
306<version platform=".." number="..">
307```
308
309> <small>
310>
311> Parents: [keyboard](#Element_keyboard)
312> Children: _none_
313> Occurence: required, single
314>
315> </small>
316
317_Attribute:_ `platform` (required)
318
319> The platform source version. Specifies what version of the platform the data is from. For example, data from Mac OSX 10.4 would be specified as `platform="10.4"`. For platforms that have unstable version numbers which change frequently (like Linux), this field is set to an integer representing the iteration of the data starting with `"1"`. This number would only increase if there were any significant changes in the keyboard data.
320
321_Attribute:_ `number` (required)
322
323> The data revision version. The attribute value must start with `$Revision` and end with `$`.
324
325_Attribute:_ `cldrVersion` (fixed by DTD)
326
327> The CLDR specification version that is associated with this data file. This value is fixed and is inherited from the [DTD file](https://github.com/unicode-org/cldr/tree/master/keyboards/dtd) and therefore does not show up directly in the XML file.
328
329**Example**
330
331```xml
332<keyboard locale="..-osx">
333334    <version platform="10.4" number="1"/>
335336</keyboard>
337```
338
339* * *
340
341### 5.3 ~~<a name="Element_generation" href="#Element_generation">Element: generation</a>~~
342
343The `generation` element is now deprecated. It was used to keep track of the generation date of the data.
344
345* * *
346
347### 5.4 <a name="Element_info" href="#Element_info">Element: info</a>
348
349Element containing informative properties about the layout, for displaying in user interfaces etc.
350
351**Syntax**
352
353```xml
354<info [author="{author}"]
355      [normalization="{form}"]
356      [layout="{hint of the layout}"]
357      [indicator="{short identifier}"] />
358```
359
360> <small>
361>
362> Parents: [keyboard](#Element_keyboard)
363> Children: _none_
364> Occurence: optional, single
365>
366> </small>
367
368_Attribute:_ `author` (optional)
369
370> The `author` attribute contains the name of the author of the layout file.
371
372_Attribute:_ `normalization` (optional)
373
374> The `normalization` attribute describes the intended normalization form of the keyboard layout output. The valid values are `NFC`, `NFD` or `other`.
375> An example use case is aiding user to choose among the two same layouts with one outputting characters in the normalization form C and one in the normalization form D.
376
377_Attribute:_ `layout` (optional)
378
379> The `layout` attribtue describes the layout pattern, such as QWERTY, DVORAK, INSCRIPT, etc. typically used to distinguish various layouts for the same language.
380
381_Attribute:_ `indicator` (optional)
382
383> The `indicator` attribute describes a short string to be used in currently selected layout indicator, such as US, SI9 etc.
384> Typically, this is shown on a UI element that allows switching keyboard layouts and/or input languages.
385
386* * *
387
388### 5.5 <a name="Element_names" href="#Element_names">Element: names</a>
389
390Element used to store any names given to the layout by the platform.
391
392**Syntax**
393
394```xml
395<names>
396    {set of name elements}
397</names>
398```
399
400> <small>
401>
402> Parents: [keyboard](#Element_keyboard)
403> Children: [name](#Element_name)
404> Occurence: required, single
405>
406> </small>
407
408### 5.6 <a name="Element_name" href="#Element_name">Element: name</a>
409
410A single name given to the layout by the platform.
411
412**Syntax**
413
414```xml
415<name value="..">
416```
417
418> <small>
419>
420> Parents: [names](#Element_names)
421> Children: _none_
422> Occurence: required, multiple
423> </small>
424
425_Attribute:_ `value` (required)
426
427> The name of the layout.
428
429**Example**
430
431```xml
432<keyboard locale="bg-t-k0-windows-phonetic-trad">
433434    <names>
435        <name value="Bulgarian (Phonetic Traditional)" />
436    </names>
437438</keyboard>
439```
440
441* * *
442
443### 5.7 <a name="Element_settings" href="#Element_settings">Element: settings</a>
444
445An element used to keep track of layout specific settings. This element may or may not show up on a layout. These settings reflect the normal practice on the platform. However, an implementation using the data may customize the behavior. For example, for `transformFailure` the implementation could ignore the setting, or modify the text buffer in some other way (such as by emitting backspaces).
446
447**Syntax**
448
449```xml
450<settings [fallback="omit"] [transformFailure="omit"] [transformPartial="hide"] />
451```
452
453> <small>
454>
455> Parents: [keyboard](#Element_keyboard)
456> Children: _none_
457> Occurence: optional, single
458>
459> </small>
460
461_Attribute:_ `fallback="omit"` (optional)
462
463> The presence of this attribute means that when a modifier key combination goes unmatched, no output is produced. The default behavior (when this attribute is not present) is to fallback to the base map when the modifier key combination goes unmatched.
464
465If this attribute is present, it must have a value of omit.
466
467_Attribute:_ `transformFailure="omit"` (optional)
468
469> This attribute describes the behavior of a transform when it is escaped (see the `transform` element in the Layout file for more information). A transform is escaped when it can no longer continue due to the entry of an invalid key. For example, suppose the following set of transforms are valid:
470>
471> ^e → ê
472>
473> ^a → â
474
475Suppose a user now enters the "\^" key then "\^" is now stored in a buffer and may or may not be shown to the user (see the `partial` attribute).
476
477If a user now enters d, then the transform has failed and there are two options for output.
478
4791. default behavior - "^d"
480
4812. omit - "" (nothing and the buffer is cleared)
482
483The default behavior (when this attribute is not present) is to emit the contents of the buffer upon failure of a transform.
484
485If this attribute is present, it must have a value of omit.
486
487_Attribute:_ `transformPartial="hide"` (optional)
488
489> This attribute describes the behavior the system while in a transform. When this attribute is present then don't show the values of the buffer as the user is typing a transform (this behavior can be seen on Windows or Linux platforms).
490
491By default (when this attribute is not present), show the values of the buffer as the user is typing a transform (this behavior can be seen on the Mac OSX platform).
492
493If this attribute is present, it must have a value of hide.
494
495**Example**
496
497```xml
498<keyboard locale="bg-t-k0-windows-phonetic-trad">
499500    <settings fallback="omit" transformPartial="hide" />
501502</keyboard>
503```
504
505Indicates that:
506
5071.  When a modifier combination goes unmatched, do not output anything when a key is pressed.
5082.  If a transform is escaped, output the contents of the buffer.
5093.  During a transform, hide the contents of the buffer as the user is typing.
510
511* * *
512
513### 5.8 <a name="Element_keyMap" href="#Element_keyMap">Element: keyMap</a>
514
515This element defines the group of mappings for all the keys that use the same set of modifier keys. It contains one or more map elements.
516
517**Syntax**
518
519```xml
520<keyMap [modifiers="{Set of Modifier Combinations}"]>
521    {a set of map elements}
522</keyMap>
523```
524
525> <small>
526>
527> Parents: [keyboard](#Element_keyboard)
528> Children: [map](#Element_map), [flicks](#Element_flicks)
529> Occurence: required, multiple
530>
531> </small>
532
533_Attribute:_ `modifiers` (optional)
534
535> A set of modifier combinations that cause this key map to be "active". Each combination is separated by a space. The interpretation is that there is a match if any of the combinations match, that is, they are ORed. Therefore, the order of the combinations within this attribute does not matter.
536
537> A combination is simply a concatenation of words to represent the simultaneous activation of one or more modifier keys. The order of the modifier keys within a combination does not matter, although don't care cases are generally added to the end of the string for readability (see next paragraph). For example: `"cmd+caps"` represents the Caps Lock and Command modifier key combination. Some keys have right or left variant keys, specified by a 'R' or 'L' suffix. For example: `"ctrlR+caps"` would represent the Right-Control and Caps Lock combination. For simplicity, the presence of a modifier without a 'R' or 'L' suffix means that either its left or right variants are valid. So `"ctrl+caps"` represents the same as `"ctrlL+ctrlR?+caps ctrlL?+ctrlR+caps"`.
538
539A modifier key may be further specified to be in a "don't care" state using the '?' suffix. The "don't care" state simply means that the preceding modifier key may be either ON or OFF. For example `"ctrl+shift?"` could be expanded into `"ctrl ctrl+shift"`.
540
541Within a combination, the presence of a modifier WITHOUT the '?' suffix indicates this key MUST be on. The converse is also true, the absence of a modifier key means it MUST be off for the combination to be active.
542
543Here is an exhaustive list of all possible modifier keys:
544
545##### <a name="Possible_Modifier_Keys" href="#Possible_Modifier_Keys">Possible Modifier Keys</a>
546
547| Modifier Keys |          | Comments                        |
548|---------------|----------|---------------------------------|
549| `altL`        | `altR`   | xAlty → xAltR+AltL? xAltR?AltLy |
550| `ctrlL`       | `ctrlR`  | ditto for Ctrl                  |
551| `shiftL`      | `shiftR` | ditto for Shift                 |
552| `optL`        | `optR`   | ditto for Opt                   |
553| `caps`        |          | Caps Lock                       |
554| `cmd`         |          | Command on the Mac              |
555
556All sets of modifier combinations within a layout are disjoint with no-overlap existing between the key maps. That is, for every possible modifier combination, there is at most a single match within the layout file. There are thus never multiple matches. If no exact match is available, the match falls back to the base map unless the `fallback="omit"` attribute in the `settings` element is set, in which case there would be no output at all.
557
558**Example**
559
560To illustrate, the following example produces an invalid layout because pressing the "Ctrl" modifier key produces an indeterminate result:
561
562```xml
563<keyMap modifiers="ctrl+shift?">
564565</keyMap>
566```
567
568```xml
569<keyMap modifiers="ctrl">
570571</keyMap>
572```
573
574Modifier Examples:
575
576```xml
577<keyMap modifiers="cmd?+opt+caps?+shift" />
578```
579
580Caps-Lock may be ON or OFF, Option must be ON, Shift must be ON and Command may be ON or OFF.
581
582```xml
583<keyMap modifiers="shift caps" />
584```
585
586Caps-Lock must be ON OR Shift must be ON.
587
588If the `modifiers` attribute is not present on a `keyMap` then that particular key map is the base map.
589
590* * *
591
592### 5.9 <a name="Element_map" href="#Element_map">Element: map</a>
593
594This element defines a mapping between the base character and the output for a particular set of active modifier keys. This element must have the `keyMap` element as its parent.
595
596If a `map` element for a particular ISO layout position has not been defined then if this key is pressed, no output is produced.
597
598**Syntax**
599
600```xml
601<map
602 iso="{the iso position}"
603 to="{the output}"
604 [longPress="{long press keys}"]
605 [transform="no"]
606 [multitap="{the output on subsequent taps}"]
607 [longPress-status="optional"]
608 [optional="{optional mappings}"]
609 [hint="{hint to long press content}"]
610 /><!-- {Comment to improve readability (if needed)} -->
611```
612
613> <small>
614>
615> Parents: [keyMap](#Element_keyMap)
616> Children: _none_
617> Occurence: optional, multiple
618>
619> </small>
620
621_Attribute:_ `iso` (exactly one of base and iso is required)
622
623> The `iso` attribute represents the ISO layout position of the key (see the definition at the beginning of the document for more information).
624
625_Attribute:_ `to` (required)
626
627> The `to` attribute contains the output sequence of characters that is emitted when pressing this particular key. Control characters, whitespace (other than the regular space character) and combining marks in this attribute are escaped using the `\u{...}` notation.
628
629_Attribute:_ `longPress="optional"` (optional)
630
631> The `longPress` attribute contains any characters that can be emitted by "long-pressing" a key, this feature is prominent in mobile devices. The possible sequences of characters that can be emitted are whitespace delimited. Control characters, combining marks and whitespace (which is intended to be a long-press option) in this attribute are escaped using the `\u{...}` notation.
632
633_Attribute:_ `transform="no"` (optional)
634
635> The `transform` attribute is used to define a key that never participates in a transform but its output shows up as part of a transform. This attribute is necessary because two different keys could output the same characters (with different keys or modifier combinations) but only one of them is intended to be a dead-key and participate in a transform. This attribute value must be no if it is present.
636
637_Attribute:_ `multitap` (optional)
638
639> A space-delimited list of strings, where each successive element of the list is produced by the corresponding number of quick taps. For example, three taps on the key C01 will produce a “c” in the following example (first tap produces “a”, two taps produce “bb” etc.).
640>
641> _Example:_
642>
643> ```xml
644> <map iso="C01" to="a" multitap="bb c d">
645> ```
646> Control characters, combining marks and whitespace (which is intended to be a multitap option) in this attribute are escaped using the `\u{...}` notation.
647
648_Attribute:_ `longPress-status` (optional)
649
650> Indicates optional `longPress` values. Must only occur with a `longPress` value. May be suppressed or shown, depending on user settings. There can be two `map` elements that differ only by `longPress-status`, allowing two different sets of `longPress` values.
651>
652> _Example:_
653>
654> ```xml
655> <map iso="D01" to="a" longPress="à â % æ á ä ã å ā ª" />
656> <map iso="D01" to="a" longPress="à â á ä ã å ā" longPress-status="optional" />
657> ```
658
659_Attribute:_ `optional` (optional)
660
661> Indicates optional mappings. May be suppressed or shown, depending on user settings.
662
663_Attribute:_ `hint` (optional)
664
665> Indicates a hint as to long-press contents, such as the first character of the `longPress` value, that can be displayed on the key. May be suppressed or shown, depending on user Settings. Characters in this attribute can be escaped using the `\u{...}` notation.
666>
667> _Example:_ where the hint is "{":
668>
669> ![keycap hint](images/keycapHint.png)
670
671For example, suppose there are the following keys, their output and one transform:
672
673```
674E00 outputs `
675Option+E00 outputs ` (the dead-version which participates in transforms).
676`e → è
677```
678
679Then the first key must be tagged with `transform="no"` to indicate that it should never participate in a transform.
680
681Comment: US key equivalent, base key, escaped output and escaped longpress
682
683In the generated files, a comment is included to help the readability of the document. This comment simply shows the English key equivalent (with prefix `key=`), the base character (`base=`), the escaped output (`to=`) and escaped long-press keys (`long=`). These comments have been inserted strategically in places to improve readability. Not all comments include all components since some of them may be obvious.
684
685**Example**
686
687```xml
688<keyboard locale="fr-BE-t-k0-windows">
689690    <keyMap modifiers="shift">
691        <map iso="D01" to="A" /> <!-- key=Q -->
692        <map iso="D02" to="Z" /> <!-- key=W -->
693        <map iso="D03" to="E" />
694        <map iso="D04" to="R" />
695        <map iso="D05" to="T" />
696        <map iso="D06" to="Y" />
697698    </keyMap>
699700</keyboard>
701```
702
703```xml
704<keyboard locale="ps-t-k0-windows">
705706    <keyMap modifiers='altR+caps? ctrl+alt+caps?'>
707        <map iso="D04" to="\u{200e}" /> <!-- key=R base=ق -->
708        <map iso="D05" to="\u{200f}" /> <!-- key=T base=ف -->
709        <map iso="D08" to="\u{670}" /> <!-- key=I base=ه to= ٰ -->
710711    </keyMap>
712713</keyboard>
714```
715
716* * *
717
718#### 5.9.1 <a name="Element_flicks" href="#Element_flicks">Elements: flicks, flick</a>
719
720The `flicks` element is used to generate results from a "flick" of the finger on a mobile device.
721
722**Syntax**
723
724```xml
725<flicks iso="{the iso position}">
726    {a set of flick elements}
727</flicks>
728```
729
730> <small>
731>
732> Parents: [keyMap](#Element_keyMap)
733> Children: [flick](#Element_flicks)
734> Occurence: optional, multiple
735>
736> </small>
737
738_Attribute:_ `iso` (required)
739
740> The `iso` attribute represents the ISO layout position of the key (see the definition at the beginning of the document for more information).
741
742**Syntax**
743
744```xml
745<flick directions="{list of directions}" to="{the output}" />
746```
747
748> <small>
749>
750> Parents: [flicks](#Element_flicks)
751> Children: _none_
752> Occurence: required, multiple
753>
754> </small>
755
756_Attribute:_ `directions` (required)
757
758> The `directions` attribute value is a space-delimited list of keywords, that describe a path, currently restricted to the cardinal and intercardinal directions `{n e s w ne nw se sw}`.
759
760_Attribute:_ `to` (required)
761
762> The to attribute value is the result of (one or more) flicks.
763
764**Example**
765where a flick to the Northeast then South produces two code points.
766
767```xml
768<flicks iso="C01">
769    <flick directions="ne s" to="\uABCD\uDCBA" />
770</flicks>
771```
772
773* * *
774
775### 5.10 <a name="Element_import" href="#Element_import">Element: import</a>
776
777The `import` element references another file of the same type and includes all the subelements of the top level element as though the `import` element were being replaced by those elements, in the appropriate section of the XML file. For example:
778
779**Syntax**
780
781```xml
782<import path="standard_transforms.xml">
783```
784
785> <small>
786>
787> Parents: [keyboard](#Element_keyboard)
788> Children: _none_
789> Occurence: optional, multiple
790>
791> </small>
792
793_Attribute:_ `path` (required)
794
795> The value is contains a relative path to the included ldml file. There is a standard set of directories to be searched that an application may provide. This set is always prepended with the directory in which the current file being read, is stored.
796
797If two identical elements, as described below, are defined, the later element will take precedence. Thus if a `hardwareMap/map` for the same keycode on the same page is defined twice (for example once in an included file), the later one will be the resulting mapping.
798
799Elements are considered to have three attributes that make them unique: the tag of the element, the parent and the identifying attribute. The parent in its turn is a unique element and so on up the chain. If the distinguishing attribute is optional, its non-existence is represented with an empty value. Here is a list of elements and their defining attributes. If an element is not listed then if it is a leaf element, only one occurs and it is merely replaced. If it has children, then the subelements are considered, in effect merging the element in question.
800
801| Element      | Parent       | Distinguishing attribute     |
802|--------------|--------------|------------------------------|
803| `import`     | `keyboard`   | `@path`                      |
804| `keyMap`     | `keyboard`   | `@modifiers`                 |
805| `map`        | `keyMap`     | `@iso`                       |
806| `flicks`     | `keyMap`     | `@iso`                       |
807| `flick`      | `flicks`     | `@directions`                |
808| `display`    | `displayMap` | `@to`                        |
809| `layer`      | `keyboard`   | `@modifier`                  |
810| `row`        | `layer`      | `@keys`                      |
811| `switch`     | `layer`      | `@iso`                       |
812| `vkeys`      | `layer`      | `@iso`                       |
813| `transforms` | `keyboard`   | `@type`                      |
814| `transform`  | `keyboard`   | `@before`, `@from`, `@after` |
815| `reorder`    | `reorders`   | `@before`, `@from`, `@after` |
816| `backspace`  | `backspaces` | `@before`, `@from`, `@after` |
817
818In order to help identify mistakes, it is an error if a file contains two elements that override each other. All element overrides must come as a result of an `<include>` element either for the element overridden or the element overriding.
819
820The following elements are not imported from the source file:
821
822* `version`
823* `generation`
824* `names`
825* `settings`
826
827* * *
828
829### 5.11 <a name="Element_displayMap" href="#Element_displayMap">Element: displayMap</a>
830
831The displayMap can be used to describe what is to be displayed on the keytops for various keys. For the most part, such explicit information is unnecessary since the `@to` element from the `keyMap/map` element can be used. But there are some characters, such as diacritics, that do not display well on their own and so explicit overrides for such characters can help. The `displayMap` consists of a list of display subelements.
832
833DisplayMaps are designed to be shared across many different keyboard layout descriptions, and included in where needed.
834
835**Syntax**
836
837```xml
838<displayMap>
839    {a set of display elements}
840</displayMap>
841```
842
843> <small>
844>
845> Parents: [keyboard](#Element_keyboard)
846> Children: [display](#Element_display)
847> Occurence: optional, single
848>
849> </small>
850
851* * *
852
853### 5.12 <a name="Element_display" href="#Element_display">Element: display</a>
854
855The `display` element describes how a character, that has come from a `keyMap/map` element, should be displayed on a keyboard layout where such display is possible.
856
857**Syntax**
858
859```xml
860<display to="{the output}" display="{show as}" />
861```
862
863> <small>
864>
865> Parents: [displayMap](#Element_displayMap)
866> Children: _none_
867> Occurence: required, multiple
868>
869> </small>
870
871_Attribute:_ `to` (required)
872
873> Specifies the character or character sequence from the `keyMap/map` element that is to have a special display.
874
875_Attribute:_ `display` (required)
876
877> Required and specifies the character sequence that should be displayed on the keytop for any key that generates the `@to` sequence. (It is an error if the value of the `display` attribute is the same as the value of the `to` attribute.)
878
879**Example**
880
881```xml
882<keyboard>
883    <keyMap>
884        <map iso="C01" to="a" longpress="\u0301 \u0300" />
885    </keyMap>
886    <displayMap>
887        <display to="\u0300" display="\u02CB" />
888        <display to="\u0301" display="\u02CA" />
889    </displayMap>
890</keyboard>
891```
892
893To allow `displayMap`s to be shared across descriptions, there is no requirement that `@to` in a `display` element matches any `@to` in any `keyMap/map` element in the keyboard description.
894
895* * *
896
897### 5.13 <a name="Element_layer" href="#Element_layer">Element: layer</a>
898
899A `layer` element describes the configuration of keys on a particular layer of a keyboard. It contains one or more `row` elements to describe which keys exist in each `row` and optionally one or more `switch` elements that describe how keys in the layer switch the layer to another. In addition, for platforms that require a mapping from a key to a virtual key (for example Windows or Mac) there is also an optional `vkeys` element to describe the mapping.
900
901**Syntax**
902
903```xml
904<layer modifier="{Set of Modifier Combinations}">
905    ...
906</layer>
907```
908
909> <small>
910>
911> Parents: [keyboard](#Element_keyboard)
912> Children: [row](#Element_row), [switch](#Element_switch), [vkeys](#Element_vkeys)
913> Occurence: optional, multiple
914>
915> </small>
916
917_Attribute:_ `modifier` (required)
918
919> This has two roles. It acts as an identifier for the `layer` element and also provides the linkage into a keyMap. A modifier is a single modifier combination such that it is matched by one of the modifier combinations in one of the `keyMap/@modifiers` attribute. To indicate that no modifiers apply the reserved name of "none" is used. For the purposes of fallback vkey mapping, the following modifier components are reserved: "shift", "ctrl", "alt", "caps", "cmd", "opt" along with the "L" and "R" optional single suffixes for the first 3 in that list. There must be a `keyMap` whose `@modifiers` attribute matches the `@modifier` attribute of the `layer` element. It is an error if there is no such `keyMap`.
920
921The `keymap/@modifier` often includes multiple combinations that match. It is not necessary (or prefered) to include all of these. Instead a minimal matching element should be used, such that exactly one keymap is matched.
922
923The following are examples of situations where the `@modifiers` and `@modifier` do not match, with a different keymap definition than above.
924
925| `keyMap/@modifiers` | `layer/@modifier`   |
926|---------------------|---------------------|
927| `shiftL`            | `shift` (ambiguous) |
928| `altR`              | `alt`               |
929| `shiftL?+shiftR`    | `shift`             |
930
931And these do match:
932
933| `keyMap/@modifiers` | `layer/@modifier` |
934|---------------------|-------------------|
935| `shiftL shiftR`     | `shift`           |
936
937The use of `@modifier` as an identifier for a layer, is sufficient since it is always unique among the set of `layer` elements in a keyboard.
938
939* * *
940
941### 5.14 <a name="Element_row" href="#Element_row">Element: row</a>
942
943A `row` element describes the keys that are present in the row of a keyboard. `row` elements are ordered within a `layout` element with the top visual row being stored first.
944
945The row element introduces the `keyId` which may be an `ISOKey` or a `specialKey`. More formally:
946
947```
948keyId = ISOKey | specialKey
949ISOKey = [A-Z][0-9][0-9]
950specialKey = [a-z][a-zA-Z0-9]{2,7}
951```
952
953ISOKey denotes a key having an [ISO Position](#Definitions). SpecialKey is used to identify functional keys occurring on a virtual keyboard layout.
954
955**Syntax**
956
957```xml
958<row keys="{keyId}" />
959```
960
961> <small>
962>
963> Parents: [layer](#Element_layer)
964> Children: _none_
965> Occurence: required, multiple
966>
967> </small>
968
969_Attribute:_ `keys` (required)
970
971> This is a string that lists the `keyId` for each of the keys in a row. Key ranges may be contracted to firstkey-lastkey but only for `ISOKey` type `keyId`s. The interpolation between the first and last keys names is entirely numeric. Thus `D00-D03` is equivalent to `D00 D01 D02 D03`. It is an error if the first and last keys do not have the same alphabetic prefix or the last key numeric component is less than or equal to the first key numeric component.
972
973`specialKey` type `keyId`s may take any value within their syntactic constraint. But the following `specialKey`s are reserved to allow applications to identify them and give them special handling:
974
975* `"bksp"`, `"enter"`, `"space"`, `"tab"`, "`esc"`, `"sym"`, `"num"`
976* all the reserved modifier names
977* specialKeys starting with the letter "x" for future reserved names.
978
979**Example**
980
981Here is an example of a `row` element:
982
983```xml
984<layer modifier="none">
985    <row keys="D01-D10" />
986    <row keys="C01-C09" />
987    <row keys="shift B01-B07 bksp" />
988    <row keys="sym A01 smilies A02-A03 enter" />
989</layer>
990```
991
992* * *
993
994### 5.15 <a name="Element_switch" href="#Element_switch">Element: switch</a>
995
996The `switch` element describes a function key that has been included in the layout. It specifies which layer pressing the key switches you to and also what the key looks like.
997
998**Syntax**
999
1000```xml
1001<switch iso="{specialKey}"
1002        layer="{Set of Modifier Combinations}"
1003        display="{show as}" />
1004```
1005
1006> <small>
1007>
1008> Parents: [layer](#Element_layer)
1009> Children: _none_
1010> Occurence: optional, multiple
1011>
1012> </small>
1013
1014_Attribute:_ `iso` (required)
1015
1016> The `keyId` as specified in one of the `row` elements. This must be a `specialKey` and not an `ISOKey`.
1017
1018_Attribute:_ `layer` (required)
1019
1020> The modifier attribute of the resulting `layer` element that describes the layer the user gets switched to.
1021
1022_Attribute:_ `display` (required)
1023
1024> A string to be displayed on the key.
1025
1026**Example**
1027
1028Here is an example of a `switch` element for a shift key:
1029
1030```xml
1031<layer modifier="none">
1032    <row keys="D01-D10" />
1033    <row keys="C01-C09" />
1034    <row keys="shift B01-B07 bksp" />
1035    <row keys="sym A01 smilies A02-A03 enter" />
1036    <switch iso="shift" layer="shift" display="&#x21EA;" />
1037</layer>
1038<layer modifier="shift">
1039    <row keys="D01-D10" />
1040    <row keys="C01-C09" />
1041    <row keys="shift B01-B07 bksp" />
1042    <row keys="sym A01 smilies A02-A03 enter" />
1043    <switch iso="shift" layer="none" display="&#x21EA;" />
1044</layer>
1045```
1046
1047* * *
1048
1049### 5.16 <a name="Element_vkeys" href="#Element_vkeys">Element: vkeys</a>
1050
1051On some architectures, applications may directly interact with keys before they are converted to characters. The keys are identified using a virtual key identifier or vkey. The mapping between a physical keyboard key and a vkey is keyboard-layout dependent. For example, a French keyboard would identify the D01 key as being an 'a' with a vkey of 'a' as opposed to 'q' on a US English keyboard. While vkeys are layout dependent, they are not modifier dependent. A shifted key always has the same vkey as its unshifted counterpart. In effect, a key is identified by its vkey and the modifiers active at the time the key was pressed.
1052
1053**Syntax**
1054
1055```xml
1056<vkeys>
1057    {a set of vkey elements}
1058</vkeys>
1059```
1060
1061> <small>
1062>
1063> Parents: [layer](#Element_layer), [keyboard](#Element_keyboard)
1064> Children: [vkey](#Element_vkey)
1065> Occurence: optional, multiple
1066>
1067> </small>
1068
1069_Attribute:_ `type`
1070
1071> Current values: android, chromeos, osx, und, windows.
1072
1073For a physical keyboard there is a layout specific default mapping of keys to vkeys. These are listed in a `vkeys` element which takes a list of `vkey` element mappings and is identified by a type. There are different vkey mappings required for different platforms. While `type="windows"` vkeys are very similar to `type="osx"` vkeys, they are not identical and require their own mapping.
1074
1075The most common model for specifying vkeys is to import a standard mapping, say to the US layout, and then to add a `vkeys` element to change the mapping appropriately for the specific layout.
1076
1077In addition to describing physical keyboards, vkeys also get used in virtual keyboards. Here the vkey mapping is local to a layer and therefore a `vkeys` element may occur within a `layout` element. In the case where a `layout` element has no `vkeys` element then the resulting mapping may either be empty (none of the keys represent keys that have vkey identifiers) or may fallback to the layout wide vkeys mapping. Fallback only occurs if the layout's `modifier` attribute consists only of standard modifiers as listed as being reserved in the description of the `layout/@modifier` attribute, and if the modifiers are standard for the platform involved. So for Windows, `"cmd"` is a reserved modifier but it is not standard for Windows. Therefore on Windows the vkey mapping for a layout with `@modifier="cmd" `would be empty.
1078
1079A `vkeys` element consists of a list of `vkey` elements.
1080
1081* * *
1082
1083### 5.17 <a name="Element_vkey" href="#Element_vkey">Element: vkey</a>
1084
1085A `vkey` element describes a mapping between a key and a vkey for a particular platform.
1086
1087**Syntax**
1088
1089```xml
1090<vkey iso="{iso position}" vkey="{identifier}"
1091      [modifier="{Set of Modifier Combinations}"] />
1092```
1093
1094> <small>
1095>
1096> Parents: [vkeys](#Element_vkeys)
1097> Children: _none_
1098> Occurence: required, multiple
1099>
1100> </small>
1101
1102_Attribute:_ `iso` (required)
1103
1104> The ISOkey being mapped.
1105
1106_Attribute:_ `vkey` (required)
1107
1108> The resultant vkey identifier (the value is platform specific).
1109
1110_Attribute:_ `modifier`
1111
1112> This attribute may only be used if the parent `vkeys` element is a child of a `layout` element. If present it allows an unmodified key from a layer to represent a modified virtual key.
1113
1114**Example**
1115
1116This example shows some of the mappings for a French keyboard layout:
1117
1118 _shared/win-vkey.xml_
1119```xml
1120<keyboard>
1121    <vkeys type="windows">
1122        <vkey iso="D01" vkey="VK_Q" />
1123        <vkey iso="D02" vkey="VK_W" />
1124        <vkey iso="C01" vkey="VK_A" />
1125        <vkey iso="B01" vkey="VK_Z" />
1126    </vkeys>
1127</keyboard>
1128```
1129
1130_shared/win-fr.xml_
1131
1132```xml
1133<keyboard>
1134    <import path="shared/win-vkey.xml">
1135    <keyMap>
1136        <map iso="D01" to="a" />
1137        <map iso="D02" to="z" />
1138        <map iso="C01" to="q" />
1139        <map iso="B01" to="w" />
1140    </keyMap>
1141    <keyMap modifiers="shift">
1142        <map iso="D01" to="A" />
1143        <map iso="D02" to="Z" />
1144        <map iso="C01" to="Q" />
1145        <map iso="B01" to="W" />
1146    </keyMap>
1147    <vkeys type="windows">
1148        <vkey iso="D01" vkey="VK_A" />
1149        <vkey iso="D02" vkey="VK_Z" />
1150        <vkey iso="C01" vkey="VK_Q" />
1151        <vkey iso="B01" vkey="VK_W" />
1152    </vkeys>
1153</keyboard>
1154```
1155
1156In the context of a virtual keyboard there might be a symbol layer with the following layout:
1157
1158```xml
1159<keyboard>
1160    <keyMap>
1161        <map iso="D01" to="1" />
1162        <map iso="D02" to="2" />
1163        ...
1164        <map iso="D09" to="9" />
1165        <map iso="D10" to="0" />
1166        <map iso="C01" to="!" />
1167        <map iso="C02" to="@" />
1168        ...
1169        <map iso="C09" to="(" />
1170        <map iso="C10" to=")" />
1171    </keyMap>
1172    <layer modifier="sym">
1173        <row keys="D01-D10" />
1174        <row keys="C01-C09" />
1175        <row keys="shift B01-B07 bksp" />
1176        <row keys="sym A00-A03 enter" />
1177        <switch iso="sym" layer="none" display="ABC" />
1178        <switch iso="shift" layer="sym+shift" display="&amp;=/<" />
1179        <vkeys type="windows">
1180            <vkey iso="D01" vkey="VK_1" />
1181            ...
1182            <vkey iso="D10" vkey="VK_0" />
1183            <vkey iso="C01" vkey="VK_1" modifier="shift" />
1184            ...
1185            <vkey iso="C10" vkey="VK_0" modifier="shift" />
1186        </vkeys>
1187    </layer>
1188</keyboard>
1189```
1190
1191* * *
1192
1193### 5.18 <a name="Element_transforms" href="#Element_transforms">Element: transforms</a>
1194
1195This element defines a group of one or more `transform` elements associated with this keyboard layout. This is used to support features such as dead-keys, character reordering, etc. using a straightforward structure that works for all the keyboards tested, and that results in readable source data.
1196
1197There can be multiple `<transforms>` elements
1198
1199Syntax
1200
1201```xml
1202<transforms type="...">
1203    {a set of transform elements}
1204</transforms>
1205```
1206
1207> <small>
1208>
1209> Parents: [keyboard](#Element_keyboard)
1210> Children: [transform](#Element_transform)
1211> Occurence: optional, multiple
1212>
1213> </small>
1214
1215_Attribute:_ `type` (required)
1216
1217> Current values: `simple`, `final`.
1218
1219
1220There are other keying behaviors that are needed particularly in handing complex orthographies from various parts of the world. The behaviors intended to be covered by the transforms are:
1221
1222* Reordering combining marks. The order required for underlying storage may differ considerably from the desired typing order. In addition, a keyboard may want to allow for different typing orders.
1223* Error indication. Sometimes a keyboard layout will want to specify to the application that a particular keying sequence in a context is in error and that the application should indicate that that particular keypress is erroneous.
1224* Backspace handling. There are various approaches to handling the backspace key. An application may treat it as an undo of the last key input, or it may simply delete the last character in the currently output text, or it may use transform rules to tell it how much to delete.
1225
1226We consider each transform type in turn and consider attributes to the `<transforms>` element pertinent to that type.
1227
1228* * *
1229
1230### 5.19 <a name="Element_transform" href="#Element_transform">Element: transform</a>
1231
1232This element must have the `transforms` element as its parent. This element represents a single transform that may be performed using the keyboard layout. A transform is an element that specifies a set of conversions from sequences of code points into one (or more) other code points.. For example, in most French keyboards hitting the "^" dead-key followed by the "e" key produces "ê".
1233
1234**Syntax**
1235
1236```xml
1237<transform from="{combination of characters}" to="{output}"
1238   [before="{look-behind required match}"]
1239   [after="{look-ahead required match}"]
1240   [error="fail"] />
1241```
1242
1243> <small>
1244>
1245> Parents: [transforms](#Element_transforms)
1246> Children: _none_
1247> Occurence: required, multiple
1248>
1249> </small>
1250
1251_Attribute:_ `from` (required)
1252
1253> The `from` attribute consists of a sequence of elements. Each element matches one character and may consist of a codepoint or a UnicodeSet (both as defined in [UTS#35 section 5.3.3](https://www.unicode.org/reports/tr35/#Unicode_Sets)).
1254
1255For example, suppose there are the following transforms:
1256
1257```
1258^e → ê
1259^a → â
1260^o → ô
1261```
1262
1263If the user types a key that produces "\^", the keyboard enters a dead state. When the user then types a key that produces an "e", the transform is invoked, and "ê" is output. Suppose a user presses keys producing "\^" then "u". In this case, there is no match for the "\^u", and the "\^" is output if the `transformFailure` attribute in the `settings` element is set to emit. If there is no transform starting with "u", then it is also output (again only if `transformFailure` is set to emit) and the mechanism leaves the "dead" state.
1264
1265The UI may show an initial sequence of matching characters with a special format, as is done with dead-keys on the Mac, and modify them as the transform completes. This behavior is specified in the `partial` attribute in the `transform` element.
1266
1267Most transforms in practice have only a couple of characters. But for completeness, the behavior is defined on all strings. The following applies when no exact match exists:
1268
12691. If there could be a longer match if the user were to type additional keys, go into a 'dead' state.
12702. If there could not be a longer match, find the longest actual match, emit the transformed text (if `transformFailure` is set to emit), and start processing again with the remainder.
12713. If there is no possible match, output the first character, and start processing again with the remainder.
1272
1273Suppose that there are the following transforms:
1274
1275```
1276ab → x
1277abc → y
1278abef → z
1279bc → m
1280beq → n
1281```
1282
1283Here's what happens when the user types various sequence characters:
1284
1285| Input characters | Result | Comments |
1286|------------------|--------|----------|
1287| ab               |        | No output, since there is a longer transform with this as prefix. |
1288| abc              | y      | Complete transform match. |
1289| abd              | xd     | The longest match is "ab", so that is converted and output. The 'd' follows, since it is not the start of any transform. |
1290| abeq             | xeq    | "ab" wins over "beq", since it comes first. That is, there is no longer possible match starting with 'a'. |
1291| bc               | m      |          |
1292
1293Control characters, combining marks and whitespace in this attribute are escaped using the `\u{...}` notation.
1294
1295_Attribute:_ `to` (required)
1296
1297> This attribute represents the characters that are output from the transform. The output can contain more than one character, so you could have `<transform from="´A" to="Fred"/>`
1298
1299Control characters, whitespace (other than the regular space character) and combining marks in this attribute are escaped using the `\u{...}` notation.
1300
1301Examples
1302
1303```xml
1304<keyboard locale="fr-CA-t-k0-CSA-osx">
1305    <transforms type="simple">
1306        <transform from="´a" to="á" />
1307        <transform from="´A" to="Á" />
1308        <transform from="´e" to="é" />
1309        <transform from="´E" to="É" />
1310        <transform from="´i" to="í" />
1311        <transform from="´I" to="Í" />
1312        <transform from="´o" to="ó" />
1313        <transform from="´O" to="Ó" />
1314        <transform from="´u" to="ú" />
1315        <transform from="´U" to="Ú" />
1316    </transforms>
1317    ...
1318</keyboard>
1319```
1320
1321```xml
1322<keyboard locale="nl-BE-t-k0-chromeos">
1323    <transforms type="simple">
1324        <transform from="\u{30c}a" to="ǎ" /> <!-- ̌a → ǎ -->
1325        <transform from="\u{30c}A" to="Ǎ" /> <!-- ̌A → Ǎ -->
1326        <transform from="\u{30a}a" to="å" /> <!-- ̊a → å -->
1327        <transform from="\u{30a}A" to="Å" /> <!-- ̊A → Å -->
1328    </transforms>
1329    ...
1330</keyboard>
1331```
1332
1333_Attribute:_ `before` (optional)
1334
1335> This attribute consists of a sequence of elements (codepoint or UnicodeSet) to match the text up to the current position in the text (this is similar to a regex "look behind" assertion: `(?<=a)b` matches a "b" that is preceded by an "a"). The attribute must match for the transform to apply. If missing, no before constraint is applied. The attribute value must not be empty.
1336
1337_Attribute:_ `after` (optional)
1338
1339> This attribute consists of a sequence of elements (codepoint or UnicodeSet) and matches as a zero-width assertion after the `@from` sequence. The attribute must match for the transform to apply. If missing, no after constraint is applied. The attribute value must not be empty. When the transform is applied, the string matched by the `@from` attribute is replaced by the string in the `@to` attribute, with the text matched by the `@after` attribute left unchanged. After the change, the current position is reset to just after the text output from the `@to` attribute and just before the text matched by the `@after` attribute. Warning: some legacy implementations may not be able to make such an adjustment and will place the current position after the `@after` matched string.
1340
1341_Attribute:_ `error="fail"` (optional)
1342
1343> If set this attribute indicates that the keyboarding application may indicate an error to the user in some way. Processing may stop and rewind to any state before the key was pressed. If processing does stop, no further transforms on the same input are applied. The `@error` attribute takes the value `"fail"`, or must be absent. If processing continues, the `@to` is used for output as normal. It thus should contain a reasonable value.
1344
1345For example:
1346
1347```xml
1348<transform from="\u037A\u037A" to="\u037A" error="fail" />
1349```
1350
1351This indicates that it is an error to type two iota subscripts immediately after each other.
1352
1353In terms of how these different attributes work in processing a sequences of transforms, consider the transform:
1354
1355```xml
1356<transform before="X" from="Y" after="Z" to="B" />
1357```
1358
1359This would transform the string:
1360
1361```
1362XYZ → XBZ
1363```
1364
1365If we mark where the current match position is before and after the transform we see:
1366
1367```
1368X | Y Z → X B | Z
1369```
1370
1371And a subsequent transform could transform the Z string, looking back (using @before) to match the B.
1372
1373* * *
1374
1375### 5.20 <a name="Element_reorder" href="#Element_reorder">Element: reorders, reorder</a>
1376
1377The reorder transform is applied after all transform except for those with `type="final"`.
1378
1379This transform has the job of reordering sequences of characters that have been typed, from their typed order to the desired output order. The primary concern in this transform is to sort combining marks into their correct relative order after a base, as described in this section. The reorder transforms can be quite complex, keyboard layouts will almost always import them.
1380
1381The reordering algorithm consists of four parts:
1382
13831. Create a sort key for each character in the input string. A sort key has 4 parts: (primary, index, tertiary).
1384   * The **primary weight** is the primary order value.
1385   * The **secondary weight** is the index, a position in the input string, usually of the character itself, but it may be of a character earlier in the string.
1386   * The **tertiary weight** is a tertiary order value (defaulting to 0).
1387   * The **quaternary weight** is the index of the character in the string. This is solely to ensure a stable sort for sequences of characters with the same tertiary weight.
13882. Mark each character as to whether it is a prebase character, one that is typed before the base and logically stored after. Thus it will have a primary order > 0.
13893. Use the sort key and the prebase mark to identify runs. A run starts with a prefix that contains any prebase characters and a single base character whose primary and tertiary key is 0. The run extends until, but not including, the start of the prefix of the next run or end of the string.
1390   * `run := prebase* (primary=0 && tertiary=0) ((primary≠0 || tertiary≠0) && !prebase)*`
13914. Sort the character order of each character in the run based on its sort key.
1392
1393The primary order of a character with the Unicode property Combining_Character_Class (ccc) of 0 may well not be 0. In addition, a character may receive a different primary order dependent on context. For example, in the Devanagari sequence ka halant ka, the first ka would have a primary order 0 while the halant ka sequence would give both halant and the second ka a primary order > 0, for example 2. Note that “base” character in this discussion is not a Unicode base character. It is instead a character with primary=0.
1394
1395In order to get the characters into the correct relative order, it is necessary not only to order combining marks relative to the base character, but also to order some combining marks in a subsequence following another combining mark. For example in Devanagari, a nukta may follow consonant character, but it may also follow a conjunct consisting of consonant, halant, consonant. Notice that the second consonant is not, in this model, the start of a new run because some characters may need to be reordered to before the first base, for example repha. The repha would get primary < 0, and be sorted before the character with order = 0, which is, in the case of Devanagari, the initial consonant of the orthographic syllable.
1396
1397The reorder transform consists of a single element type: `<reorder>` encapsulated in a `<reorders>` element. Each is a rule that matches against a string of characters with the action of setting the various ordering attributes (`primary`, `tertiary`, `tertiary_base`, `prebase`) for the matched characters in the string.
1398
1399**Syntax**
1400
1401```xml
1402<reorder from="{combination of characters}"
1403   [before="{look-behind required match}"]
1404   [after="{look-ahead required match}"]
1405   [order="{list of weights}"]
1406   [tertiary="{list of weights}"]
1407   [tertiary_base="{list of true/false}"]
1408   [prebase="{list of true/false}"] />
1409```
1410
1411> <small>
1412>
1413> Parents: [reorders](#Element_reorder)
1414> Children: _none_
1415> Occurence: required, multiple
1416>
1417> </small>
1418
1419_Attribute:_ `from` (required)
1420
1421> This attribute follows the `transform/@from` attribute and contains a string of elements. Each element matches one character and may consist of a codepoint or a UnicodeSet (both as defined in UTS#35 section 5.3.3).
1422
1423_Attribute:_ `before`
1424
1425> This attribute follows the `transform/@before` attribute and contains the element string that must match the string immediately preceding the start of the string that the @from matches.
1426
1427_Attribute:_ `after`
1428
1429> This attribute follows the `transform/@after` attribute and contains the element string that must match the string immediately following the end of the string that the `@from` matches.
1430
1431_Attribute:_ `order`
1432
1433> This attribute gives the primary order for the elements in the matched string in the `@from` attribute. The value is a simple integer between -128 and +127 inclusive, or a space separated list of such integers. For a single integer, it is applied to all the elements in the matched string. Details of such list type attributes are given after all the attributes are described. If missing, the order value of all the matched characters is 0. We consider the order value for a matched character in the string.
1434>
1435> * If the value is 0 and its tertiary value is 0, then the character is the base of a new run.
1436> * If the value is 0 and its tertiary value is non-zero, then it is a normal character in a run, with ordering semantics as described in the `@tertiary` attribute.
1437> * If the value is negative, then the character is a primary character and will reorder to be before the base of the run.
1438> * If the value is positive, then the character is a primary character and is sorted based on the order value as the primary key following a previous base character.
1439>
1440> A character with a zero tertiary value is a primary character and receives a sort key consisting of:
1441>
1442> * Primary weight is the order value
1443> * Secondary weight is the index of the character. This may be any value (character index, codepoint index) such that its value is greater than the character before it and less than the character after it.
1444> * Tertiary weight is 0.
1445> * Quaternary weight is the same as the secondary weight.
1446
1447_Attribute:_ `tertiary`
1448
1449> This attribute gives the tertiary order value to the characters matched. The value is a simple integer between -128 and +127 inclusive, or a space separated list of such integers. If missing, the value for all the characters matched is 0. We consider the tertiary value for a matched character in the string.
1450>
1451> * If the value is 0 then the character is considered to have a primary order as specified in its order value and is a primary character.
1452> * If the value is non zero, then the order value must be zero otherwise it is an error. The character is considered as a tertiary character for the purposes of ordering.
1453>
1454> A tertiary character receives its primary order and index from a previous character, which it is intended to sort closely after. The sort key for a tertiary character consists of:
1455>
1456> * Primary weight is the primary weight of the primary character
1457> * Secondary weight is the index of the primary character, not the tertiary character
1458> * Tertiary weight is the tertiary value for the character.
1459> * Quaternary weight is the index of the tertiary character.
1460
1461_Attribute:_ `tertiary_base`
1462
1463> This attribute is a space separated list of `"true"` or `"false"` values corresponding to each character matched. It is illegal for a tertiary character to have a true `tertiary_base` value. For a primary character it marks that this character may have tertiary characters moved after it. When calculating the secondary weight for a tertiary character, the most recently encountered primary character with a true `tertiary_base` attribute is used. Primary characters with an `@order` value of 0 automatically are treated as having `tertiary_base` true regardless of what is specified for them.
1464
1465_Attribute:_ `prebase`
1466
1467> This attribute gives the prebase attribute for each character matched. The value may be `"true"` or `"false"` or a space separated list of such values. If missing the value for all the characters matched is false. It is illegal for a tertiary character to have a true prebase value.
1468>
1469> If a primary character has a true prebase value then the character is marked as being typed before the base character of a run, even though it is intended to be stored after it. The primary order gives the intended position in the order after the base character, that the prebase character will end up. Thus `@order` shall not be 0. These characters are part of the run prefix. If such characters are typed then, in order to give the run a base character after which characters can be sorted, an appropriate base character, such as a dotted circle, is inserted into the output run, until a real base character has been typed. A value of `"false"` indicates that the character is not a prebase.
1470
1471There is no `@error` attribute.
1472
1473For `@from` attributes with a match string length greater than 1, the sort key information (`@order`, `@tertiary`, `@tertiary_base`, `@prebase`) may consist of a space separated list of values, one for each element matched. The last value is repeated to fill out any missing values. Such a list may not contain more values than there are elements in the `@from` attribute:
1474
1475```
1476if len(@from) < len(@list) then error
1477else
1478    while len(@from) > len(@list)
1479        append lastitem(@list) to @list
1480    endwhile
1481endif
1482```
1483
1484**Example**
1485
1486For example, consider the word Northern Thai (nod-Lana) word: ᨡ᩠ᩅᩫ᩶ 'roasted'. This is ideally encoded as the following:
1487
1488| name | _kha_ | _sakot_ | _wa_ | _o_  | _t2_ |
1489|------|-------|---------|------|------|------|
1490| code | 1A21  | 1A60    | 1A45 | 1A6B | 1A76 |
1491| ccc  | 0     | 9       | 0    | 0    | 230  |
1492
1493(That sequence is already in NFC format.)
1494
1495Some users may type the upper component of the vowel first, and the tone before or after the lower component. Thus someone might type it as:
1496
1497| name | _kha_ | _o_  | _t2_ | _sakot_ | _wa_ |
1498|------|-------|------|------|---------|------|
1499| code | 1A21  | 1A6B | 1A76 | 1A60    | 1A45 |
1500| ccc  | 0     | 0    | 230  | 9       | 0    |
1501
1502The Unicode NFC format of that typed value reorders to:
1503
1504| name | _kha_ | _o_  | _sakot_ | _t2_ | _wa_ |
1505|------|-------|------|---------|------|------|
1506| code | 1A21  | 1A6B | 1A60    | 1A76 | 1A45 |
1507| ccc  | 0     | 0    | 9       | 230  | 0    |
1508
1509Finally, the user might also type in the sequence with the tone _after_ the lower component.
1510
1511| name | _kha_ | _o_  | _sakot_ | _wa_ | _t2_ |
1512|------|-------|------|---------|------|------|
1513| code | 1A21  | 1A6B | 1A60    | 1A45 | 1A76 |
1514| ccc  | 0     | 0    | 9       | 0    | 230  |
1515
1516(That sequence is already in NFC format.)
1517
1518We want all of these sequences to end up ordered as the first. To do this, we use the following rules:
1519
1520```xml
1521<reorder from="\u1A60" order="127" />      <!-- max possible order -->
1522<reorder from="\u1A6B" order="42" />
1523<reorder from="[\u1A75-\u1A79]" order="55" />
1524<reorder before="\u1A6B" from="\u1A60\u1A45" order="10" />
1525<reorder before="\u1A6B[\u1A75-\u1A79]" from="\u1A60\u1A45" order="10" />
1526<reorder before="\u1A6B" from="\u1A60[\u1A75-\u1A79]\u1A45" order="10 55 10" />
1527```
1528
1529The first reorder is the default ordering for the _sakot_ which allows for it to be placed anywhere in a sequence, but moves any non-consonants that may immediately follow it, back before it in the sequence. The next two rules give the orders for the top vowel component and tone marks respectively. The next three rules give the _sakot_ and _wa_ characters a primary order that places them before the _o_. Notice particularly the final reorder rule where the _sakot_+_wa_ is split by the tone mark. This rule is necessary in case someone types into the middle of previously normalized text.
1530
1531`<reorder>` elements are priority ordered based first on the length of string their `@from` attribute matches and then the sum of the lengths of the strings their `@before` and `@after` attributes match.
1532
1533If a layout has two `<reorders>` elements, e.g. from importing one and specifying the second, then `<reorder>` elements are merged. The @from string in a `<reorder>` element describes a set of strings that it matches. This also holds for the `@before` and `@after` attributes. The intersection of two `<reorder>` elements consists of the intersections of their `@from`, `@before` and `@after` string sets. It is illegal for the intersection between any two `<reorder>` elements in the same `<reorders>` element to be non empty, although implementors are encouraged to have pity on layout authors when reporting such errors, since they can be hard to track down.
1534
1535If two `<reorder>` elements in two different `<reorders>` elements have a non empty intersection, then they are split and merged. They are split such that where there were two `<reorder>` elements, there are, in effect (but not actuality), three elements consisting of:
1536
1537* `@from`, `@before`, `@after` that match the intersection of the two rules. The other attributes are merged, as described below.
1538* `@from`, `@before`, `@after` that match the set of strings in the first rule not in the intersection with the other attributes from the first rule.
1539* `@from`, `@before`, `@after` that match the set of strings in the second rule not in the intersection, with the other attributes from the second rule.
1540
1541When merging the other attributes, the second rule is taken to have priority (occurring later in the layout description file). Where the second rule does not define the value for a character but the first does, it is taken from the first rule, otherwise it is taken from the second rule.
1542
1543Notice that it is possible for two rules to match the same string, but for them not to merge because the distribution of the string across `@before`, `@from`, and `@after` is different. For example:
1544
1545```xml
1546<reorder before="ab" from="cd" after="e" />
1547```
1548
1549would not merge with:
1550
1551```xml
1552<reorder before="a" from="bcd" after="e" />
1553```
1554
1555When two `<reorders>` elements merge as the result of an import, the resulting `reorder` elements are sorted into priority order for matching.
1556
1557Consider this fragment from a shared reordering for the Myanmar script:
1558
1559```xml
1560<!-- medial-r -->
1561<reorder from="\u103C" order="20" />
1562
1563<!-- [medial-wa or shan-medial-wa] -->
1564<reorder from="[\u103D\u1082]" order="25" />
1565
1566<!-- [medial-ha or shan-medial-wa]+asat = Mon asat -->
1567<reorder from="[\u103E\u1082]\u103A" order="27" />
1568
1569<!-- [medial-ha or mon-medial-wa] -->
1570<reorder from="[\u103E\u1060]" order="27" />
1571
1572<!-- [e-vowel or shan-e-vowel] -->
1573<reorder from="[\u1031\u1084]" order="30" />
1574
1575<reorder from="[\u102D\u102E\u1033-\u1035\u1071-\u1074\u1085\u109D\uA9E5]" order="35" />
1576```
1577
1578A particular Myanmar keyboard layout can have this `reorders` element:
1579
1580```xml
1581<reorders>
1582    <!-- Kinzi -->
1583    <reorder from="\u1004\u103A\u1039" order="-1" />
1584
1585    <!-- e-vowel -->
1586    <reorder from="\u1031" prebase="1" />
1587
1588    <!-- medial-r -->
1589    <reorder from="\u103C" prebase="1" />
1590</reorders>
1591```
1592
1593The effect of this that the _e-vowel_ will be identified as a prebase and will have an order of 30. Likewise a _medial-r_ will be identified as a prebase and will have an order of 20. Notice that a _shan-e-vowel_ will not be identified as a prebase (even if it should be!). The _kinzi_ is described in the layout since it moves something across a run boundary. By separating such movements (prebase or moving to in front of a base) from the shared ordering rules, the shared ordering rules become a self-contained combining order description that can be used in other keyboards or even in other contexts than keyboarding.
1594
1595* * *
1596
1597### 5.21 <a name="Element_final" href="#Element_final">Element: transform final</a>
1598
1599The final transform is applied after the reorder transform. It executes in a similar way to the simple transform with the settings ignored, as if there were no settings in the `<settings>` element.
1600
1601**Example**
1602
1603This is an example from Khmer where split vowels are combined after reordering.
1604
1605```xml
1606<transforms type="final">
1607    <transform from="\u17C1\u17B8" to="\u17BE" />
1608    <transform from="\u17C1\u17B6" to="\u17C4" />
1609</transforms>
1610```
1611
1612Another example allows a keyboard implementation to alert or stop people typing two lower vowels in a Burmese cluster:
1613
1614```xml
1615<transform from="[\u102F\u1030\u1048\u1059][\u102F\u1030\u1048\u1059]" error="fail" />
1616```
1617
1618* * *
1619
1620### 5.22 <a name="Element_backspaces" href="#Element_backspaces">Element: backspaces</a>
1621
1622The backspace transform is an optional transform that is not applied on input of normal characters, but is only used to perform extra backspace modifications to previously committed text.
1623
1624Keyboarding applications typically, but are not required, to work in one of two modes:
1625
1626**_text entry_**
1627
1628> text entry happens while a user is typing new text. A user typically wants the backspace key to undo whatever they last typed, whether or not they typed things in the 'right' order.
1629
1630**_text editing_**
1631
1632> text editing happens when a user moves the cursor into some previously entered text which may have been entered by someone else. As such, there is no way to know in which order things were typed, but a user will still want appropriate behaviour when they press backspace. This may involve deleting more than one character or replacing a sequence of characters with a different sequence.
1633
1634In the text entry mode, there is no need for any special description of backspace behaviour. A keyboarding application will typically keep a history of previous output states and just revert to the previous state when backspace is hit.
1635
1636In text editing mode, different keyboard layouts may behave differently in the same textual context. The backspace transform allows the keyboard layout to specify the effect of pressing backspace in a particular textual context. This is done by specifying a set of backspace rules that match a string before the cursor and replace it with another string. The rules are expressed as `backspace` elements encapsulated in a `backspaces` element.
1637
1638**Syntax**
1639
1640```xml
1641<backspaces>
1642    {a set of backspace elements}
1643</backspace>
1644```
1645
1646> <small>
1647>
1648> Parents: [keyboard](#Element_keyboard)
1649> Children: [backspace](#Element_backspace)
1650> Occurence: optional, single
1651>
1652> </small>
1653
1654* * *
1655
1656### 5.23 <a name="Element_backspace" href="#Element_backspace">Element: backspace</a>
1657
1658**Syntax**
1659
1660```xml
1661<backspace from="{combination of characters}" [to="{output}"]
1662   [before="{look-behind required match}"]
1663   [after="{look-ahead required match}"]
1664   [error="fail"] />
1665```
1666
1667> <small>
1668>
1669> Parents: [backspaces](#Element_backspaces)
1670> Children: _none_
1671> Occurence: required, multiple
1672>
1673> </small>
1674
1675The `backspace` element has the same `@before`, `@from`, `@after`, `@to`, `@error` of the `transform` element. The `@to` is optional with `backspace`.
1676
1677**Example**
1678
1679For example, consider deleting a Devanagari ksha:
1680
1681```xml
1682<backspaces>
1683    <backspace from="\u0915\u094D\u0936"/>
1684</backspaces>
1685```
1686
1687Here there is no `@to` attribute since the whole string is being deleted. This is not uncommon in the backspace transforms.
1688
1689A more complex example comes from a Burmese visually ordered keyboard:
1690
1691```xml
1692<backspaces>
1693    <!-- Kinzi -->
1694    <backspace from="[\u1004\u101B\u105A]\u103A\u1039" />
1695
1696    <!-- subjoined consonant -->
1697    <backspace from="\u1039[\u1000-\u101C\u101E\u1020\u1021\u1050\u1051\u105A-\u105D]" />
1698
1699    <!-- tone mark -->
1700    <backspace from="\u102B\u103A" />
1701
1702    <!-- Handle prebases -->
1703    <!-- diacritics stored before e-vowel -->
1704    <backspace from="[\u103A-\u103F\u105E-\u1060\u1082]\u1031" to="\u1031" />
1705
1706    <!-- diacritics stored before medial r -->
1707    <backspace from="[\u103A-\u103B\u105E-\u105F]\u103C" to="\u103C" />
1708
1709    <!-- subjoined consonant before e-vowel -->
1710    <backspace from="\u1039[\u1000-\u101C\u101E\u1020\u1021]\u1031" to="\u1031" />
1711
1712    <!-- base consonant before e-vowel -->
1713    <backspace from="[\u1000-\u102A\u103F-\u1049\u104E]\u1031" to="\uFDDF\u1031" />
1714
1715    <!-- subjoined consonant before medial r -->
1716    <backspace from="\u1039[\u1000-\u101C\u101E\u1020\u1021]\u103C" to="\u103C" />
1717
1718    <!-- base consonant before medial r -->
1719    <backspace from="[\u1000-\u102A\u103F-\u1049\u104E]\u103C" to="\uFDDF\u103C" />
1720
1721    <!-- delete lone medial r or e-vowel -->
1722    <backspace from="\uFDDF[\u1031\u103C]" />
1723</backspaces>
1724```
1725
1726The above example is simplified, and doesn't fully handle the interaction between medial-r and e-vowel.
1727
1728The character \\uFDDF does not represent a literal character, but is instead a special placeholder, a "filler string". When a keyboard implementation handles a user pressing a key that inserts a prebase character, it also has to insert a special filler string before the prebase to ensure that the prebase character does not combine with the previous cluster. See the reorder transform for details. The precise filler string is implementation dependent. Rather than requiring keyboard layout designers to know what the filler string is, we reserve a special character that the keyboard layout designer may use to reference this filler string. It is up to the keyboard implementation to, in effect, replace that character with the filler string.
1729
1730The first three transforms above delete various ligatures with a single keypress. The other transforms handle prebase characters. There are two in this Burmese keyboard. The transforms delete the characters preceding the prebase character up to base which gets replaced with the prebase filler string, which represents a null base. Finally the prebase filler string + prebase is deleted as a unit.
1731
1732The backspace transform is much like other transforms except in its processing model. If we consider the same transform as in the simple transform example, but as a backspace:
1733
1734```xml
1735<backspace before="X" from="Y" after="Z" to="B"/>
1736```
1737
1738This would transform the string:
1739
1740```
1741XYZ → XBZ
1742```
1743
1744If we mark where the current match position is before and after the transform we see:
1745
1746```
1747X Y | Z → X B | Z
1748```
1749
1750Whereas a simple or final transform would then run other transforms in the transform list, advancing the processing position until it gets to the end of the string, the backspace transform only matches a single backspace rule and then finishes.
1751
1752* * *
1753
1754## 6 <a name="Element_Heirarchy_Platform_File" href="#Element_Heirarchy_Platform_File">Element Hierarchy - Platform File</a>
1755
1756There is a separate XML structure for platform-specific configuration elements. The most notable component is a mapping between the hardware key codes to the ISO layout positions for that platform.
1757
1758### 6.1 <a name="Element_platform" href="#Element_platform">Element: platform</a>
1759
1760This is the top level element. This element contains a set of elements defined below. A document shall only contain a single instance of this element.
1761
1762**Syntax**
1763
1764```xml
1765<platform>
1766    {platform-specific elements}
1767</platform>
1768```
1769
1770> <small>
1771>
1772> Parents: _none_
1773> Children: [hardwareMap](#Element_hardwareMap)
1774> Occurence: required, single
1775>
1776> </small>
1777
1778
1779### 6.2 <a name="Element_hardwareMap" href="#Element_hardwareMap">Element: hardwareMap</a>
1780
1781This element must have a `platform` element as its parent. This element contains a set of map elements defined below. A document shall only contain a single instance of this element.
1782
1783**Syntax**
1784
1785```xml
1786<platform>
1787    <hardwareMap>
1788        {a set of map elements}
1789    </hardwareMap>
1790</platform>
1791```
1792
1793> <small>
1794>
1795> Parents: [platform](#Element_platform)
1796> Children: [map](#Element_hardwareMap_map)
1797> Occurence: optional, single
1798>
1799> </small>
1800
1801### 6.3 <a name="Element_hardwareMap_map" href="#Element_hardwareMap_map">Element: map</a>
1802
1803This element must have a `hardwareMap` element as its parent. This element maps between a hardware keycode and the corresponding ISO layout position of the key.
1804
1805**Syntax**
1806
1807```xml
1808<map keycode="{hardware keycode}" iso="{ISO layout position}" />
1809```
1810
1811> <small>
1812>
1813> Parents: [hardwareMap](#Element_hardwareMap)
1814> Children: _none_
1815> Occurence: required, multiple
1816> </small>
1817
1818_Attribute:_ `keycode` (required)
1819
1820> The hardware key code value of the key. This value is an integer which is provided by the keyboard driver.
1821
1822_Attribute:_ `iso` (required)
1823
1824> The corresponding position of a key using the ISO layout convention where rows are identified by letters and columns are identified by numbers. For example, "D01" corresponds to the "Q" key on a US keyboard. (See the definition at the beginning of the document for a diagram).
1825
1826**Example**
1827
1828```xml
1829<platform>
1830    <hardwareMap>
1831        <map keycode="2" iso="E01" />
1832        <map keycode="3" iso="E02" />
1833        <map keycode="4" iso="E03" />
1834        <map keycode="5" iso="E04" />
1835        <map keycode="6" iso="E05" />
1836        <map keycode="7" iso="E06" />
1837        <map keycode="41" iso="E00" />
1838    </hardwareMap>
1839</platform>
1840```
1841
1842* * *
1843
1844## 7 <a name="Invariants" href="#Invariants">Invariants</a>
1845
1846Beyond what the DTD imposes, certain other restrictions on the data are imposed on the data.
1847
18481.  For a given platform, every `map[@iso]` value must be in the hardwareMap if there is one (`_keycodes.xml`)
18492.  Every `map[@base]` value must also be in `base[@base]` value
18503.  No `keyMap[@modifiers]` value can overlap with another `keyMap[@modifiers]` value.
1851    * eg you can't have `"RAlt Ctrl"` in one `keyMap`, and `"Alt Shift"` in another (because Alt = RAltLAlt).
18524.  Every sequence of characters in a `transform[@from]` value must be a concatenation of two or more `map[@to]` values.
1853    * eg with `<transform from="xyz" to="q">` there must be some map values to get there, such as `<map... to="xy">` & `<map... to="z">`
18545.  If the base and chars values for `modifiers=""` are all identical, and there are no longpresses, that `keyMap` must not appear (??)
18556.  There will never be overlaps among modifier values.
18567.  A modifier set will never have ? (optional) on all values
1857    * eg, you'll never have `RCtrl?Caps?LShift?`
18588.  Every `base[@base`] value must be unique.
18599. A `modifier` attribute value will aways be minimal, observing the following simplification rules.
1860
1861| Notation                                 | Notes |
1862|------------------------------------------|-------|
1863| Lower case character (eg. _x_ )          | Interpreted as any combination of modifiers. <br/> (eg. _x_ = CtrlShiftOption) |
1864| Upper-case character (eg. _Y_ )          | Interpreted as a single modifier key (which may or may not have a L and R variant) <br/> (eg. _Y_ = Ctrl, _RY_ = RCtrl, etc..) |
1865| Y? ⇔ Y ∨ ∅ <br/> Y ⇔ LY ∨ RY ∨ LYRY | Eg. Opt? ⇔ ROpt ∨ LOpt ∨ ROptLOpt ∨ ∅ <br/> Eg. Opt ⇔ ROpt ∨ LOpt ∨ ROptLOpt |
1866
1867
1868| Axiom                                       | Example                                      |
1869|---------------------------------------------|----------------------------------------------|
1870| xY ∨ x ⇒ xY?                              | OptCtrlShift OptCtrl → OptCtrlShift?         |
1871| xRY ∨ xY? ⇒ xY? <br/> xLY ∨ xY? ⇒ xY?   | OptCtrlRShift OptCtrlShift? → OptCtrlShift?  |
1872| xRY? ∨ xY ⇒ xY? <br/> xLY? ∨ xY ⇒ xY?   | OptCtrlRShift? OptCtrlShift → OptCtrlShift?  |
1873| xRY? ∨ xY? ⇒ xY? <br/> xLY? ∨ xY? ⇒ xY? | OptCtrlRShift? OptCtrlShift? → OptCtrlShift? |
1874| xRY ∨ xY ⇒ xY <br/> xLY ∨ xY ⇒ xY       | OptCtrlRShift OptCtrlShift → OptCtrlShift?   |
1875| LY?RY?                                      | OptRCtrl?LCtrl? → OptCtrl?                   |
1876| xLY? ⋁ xLY ⇒ xLY?                          |                                              |
1877| xY? ⋁ xY ⇒ xY?                             |                                              |
1878| xY? ⋁ x ⇒ xY?                              |                                              |
1879| xLY? ⋁ x ⇒ xLY?                            |                                              |
1880| xLY ⋁ x ⇒ xLY?                             |                                              |
1881
1882* * *
1883
1884## 8 <a name="Data_Sources" href="#Data_Sources">Data Sources</a>
1885
1886Here is a list of the data sources used to generate the initial key map layouts:
1887
1888##### <a name="Key_Map_Data_Sources" href="#Key_Map_Data_Sources">Key Map Data Sources</a>
1889
1890| Platform | Source | Notes |
1891|----------|--------|-------|
1892| Android  | Android 4.0 - Ice Cream Sandwich ([https://source.android.com/source/downloading.html](https://source.android.com/source/downloading.html)) | Parsed layout files located in packages/inputmethods/LatinIME/java/res |
1893| ChromeOS | XKB ([https://www.x.org/wiki/XKB](https://www.x.org/wiki/XKB)) | The ChromeOS represents a very small subset of the keyboards available from XKB.
1894| Mac OSX  | Ukelele bundled System Keyboards ([https://software.sil.org/ukelele/](https://software.sil.org/ukelele/)) | These layouts date from Mac OSX 10.4 and are therefore a bit outdated |
1895| Windows  | Generated .klc files from the Microsoft Keyboard Layout Creator ([https://support.microsoft.com/en-us/help/823010/the-microsoft-keyboard-layout-creator](https://support.microsoft.com/en-us/help/823010/the-microsoft-keyboard-layout-creator)) |
1896
1897* * *
1898
1899## 9 <a name="Keyboard_IDs" href="#Keyboard_IDs">Keyboard IDs</a>
1900
1901There is a set of subtags that help identify the keyboards. Each of these are used after the `"t-k0"` subtags to help identify the keyboards. The first tag appended is a mandatory platform tag followed by zero or more tags that help differentiate the keyboard from others with the same locale code.
1902
1903### 9.1 <a name="Principles_for_Keyboard_Ids" href="#Principles_for_Keyboard_Ids">Principles for Keyboard Ids</a>
1904
1905The following are the design principles for the ids.
1906
19071. BCP47 compliant.
1908   1. Eg, `en-t-k0-extended`.
19092. Use the minimal language id based on `likelySubtag`s.
1910   1. Eg, instead of `en-US-t-k0-xxx`, use `en-t-k0-xxx`. Because there is `<likelySubtag from="en" to="en_Latn_US"/>`, en-US → en.
1911   2. The data is in [https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/likelySubtags.xml](https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/likelySubtags.xml)
19123. The platform goes first, if it exists. If a keyboard on the platform changes over time, both are dated, eg `bg-t-k0-chromeos-2011`. When selecting, if there is no date, it means the latest one.
19134. Keyboards are only tagged that differ from the "standard for each platform". That is, for each language on a platform, there will be a keyboard with no subtags other than the platform. Subtags with a common semantics across platforms are used, such as `-extended`, `-phonetic`, `-qwerty`, `-qwertz`, `-azerty`, …
19145. In order to get to 8 letters, abbreviations are reused that are already in [bcp47](https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/) -u/-t extensions and in [language-subtag-registry](https://www.iana.org/assignments/language-subtag-registry) variants, eg for Traditional use `-trad` or `-traditio` (both exist in [bcp47](https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/)).
19156. Multiple languages cannot be indicated, so the predominant target is used.
1916   1. For Finnish + Sami, use `fi-t-k0-smi` or `extended-smi`
19177. In some cases, there are multiple subtags, like `en-US-t-k0-chromeos-intl-altgr.xml`
19188. Otherwise, platform names are used as a guide.
1919
1920* * *
1921
1922## 10 <a name="Platform_Behaviors_in_Edge_Cases" href="#Platform_Behaviors_in_Edge_Cases">Platform Behaviors in Edge Cases</a>
1923
1924| Platform | No modifier combination match is available | No map match is available for key position | Transform fails (ie. if \^d is pressed when that transform does not exist) |
1925|----------|--------------------------------------------|--------------------------------------------|---------------------------------------------------------------------------|
1926| ChromeOS | Fall back to base | Fall back to character in a keyMap with same "level" of modifier combination. If this character does not exist, fall back to (n-1) level. (This is handled data-generation side). <br/> In the spec: No output | No output at all |
1927| Mac OSX  | Fall back to base (unless combination is some sort of keyboard shortcut, eg. cmd-c) | No output | Both keys are output separately |
1928| Windows  | No output | No output | Both keys are output separately |
1929
1930* * *
1931
1932Copyright © 2001–2021 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://unicode.org/copyright.html) apply.
1933
1934Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.
1935