1--- 2layout: default 3title: BiDi Algorithm 4nav_order: 2 5parent: Transforms 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# BiDi Algorithm 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Overview 24 25Bidirectional text consists of mainly right-to-left text with some left-to-right 26nested segments (such as an Arabic text with some information in English), or 27vice versa (such as an English letter with a Hebrew address nested within it.) 28The predominant direction is called the global orientation. 29 30Languages involving bidirectional text are used mainly in the Middle East. They 31include Arabic, Urdu, Persian, Hebrew, and Yiddish. 32 33In such a language, the general flow of text proceeds horizontally from right to 34left, but numbers are written from left to right, the same way as they are 35written in English. In addition, if some text (addresses, acronyms, or 36quotations) in English or another left-to-right language is embedded, it is also 37written from left to right. 38 39* Libraries that perform a bidirectional algorithm and reorder strings 40accordingly are sometimes called "Storage Layout Engines". ICU's BiDi (ubidi.h) 41and shaping (ushape.h) APIs can be used at the core of such "Storage Layout 42Engines". * 43 44## Countries with Languages that Require Bidirectional Scripting 45 46There are over 600 million people whose languages are written right-to-left, including 47Persian and Urdu which use the Arabic script with additional characters. 48 49| Language | Countries (examples) | 50|----------|------------------------------------------------------| 51| Arabic | Egypt, Jordan, Morocco, Saudi Arabia, ... Middle East & North Africa | 52| Persian | Iran, Afghanistan | 53| Urdu | India, Pakistan | 54| Hebrew | Israel | 55| Yiddish | Israel, North America, South America, Russia, Europe | 56 57This list of languages is far from complete. Other languages with RTL scripts include 58Divehi (Maldives), Kurdish (Iraq), Kashmiri (India), Sindhi (Pakistan and India), Uighur (China), and Pashto (Afghanistan), etc. 59 60## Logical Order versus Visual Order 61 62When reading bidirectional text, whenever the eye of the experienced reader 63encounters an embedded segment, it "automatically" jumps to the other end of the 64segment and reads it in the opposite direction. The sequence in which the 65characters are pronounced is thus a logical sequence which differs from the 66visual sequence in which they are presented on the screen or page. 67 68The logical order of bidirectional text is also the order in which it is usually 69keyed, and in which it is stored in memory. 70 71Consider the following example, where Arabic or Hebrew letters are represented 72by uppercase English letters and English text is represented by lowercase 73letters: 74 75 english CIBARA text 76 77The English letter h is visually followed by the Arabic letter C, but logically 78h is followed by the rightmost letter A. The next letter, in logical order, will 79be R. In other words, the logical and storage order of the same text would be: 80 81 english ARABIC text 82 83Text is stored and processed in logical order to make processing feasible: A 84contiguous substring of logical-order text (e.g., from a copy&paste operation) 85contains a logically contiguous piece of the text. For example, "ish ARA" is a 86logically contiguous piece of the sample text above. By contrast, a contiguous 87substring of visual-order text may contain pieces of the text from distant parts 88of a paragraph. ("ish" and "CIB" from the sample text above are not logically 89adjacent.) Sorting and searching in text (establishing lexical order among 90strings) as well as any other kind of context-sensitive text analysis also rely 91on the storage of text in logical order because such processing must match user 92expectations. 93 94When text is displayed or printed, it must be "reordered" into visual order with 95some parts of the text laid out left-to-right, and other parts laid out 96right-to-left. The Unicode standard specifies an algorithm for this 97logical-to-visual reordering. It always works on a paragraph as a whole; the 98actual positioning of the text on the screen or paper must then take line breaks 99into account, based on the output of the bidirectional algorithm. The reordering 100output is also used for cursor movement and selection. 101 102Legacy systems frequently stored text in visual order to avoid reordering for 103display. When exchanging data with such systems for processing in Unicode it is 104necessary to reorder the data from visual order to logical order and back. Such 105not-for-display transformations are sometimes referred to as "storage layout" 106transformations. 107 108The are two problems with an "inverse reordering" from visual to logical order: 109There may be more than one logical order of text that results in the same 110display (logical-to-visual reordering is a many-to-one function), and there is 111no standard algorithm for it. ICU's BiDi API provides a setting for "inverse" 112operation that modifies the standard Unicode Bidi algorithm. However, it may not 113always produce the expected results. Bidirectional data should be converted to 114Unicode and reordered to logical order only once to avoid roundtrip losses. Just 115as it is best to never convert to non-Unicode charsets, data should not be 116reordered from logical to visual order except for display and printing. 117 118## References 119 120ICU provides an implementation of the Unicode BiDi algorithm, as well as simple 121functions to write a reordered version of the string using the generated 122meta-data. An "inverse" flag can be set to **approximate** visual-to-logical 123reordering. See the ubidi.h header file and the [BiDi API 124References](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html) . 125 126See [Unicode Standard Annex #9: The Bidirectional 127Algorithm](http://www.unicode.org/reports/tr9/) . 128 129## Programming Examples in C and C++ 130 131See the [BiDi API reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html) 132for more information. 133