1<?xml version="1.0"?> 2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ 4 <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> 5 <!ENTITY version SYSTEM "version.xml"> 6]> 7<chapter id="getting-started"> 8 <title>Getting started with HarfBuzz</title> 9 <section id="an-overview-of-the-harfbuzz-shaping-api"> 10 <title>An overview of the HarfBuzz shaping API</title> 11 <para> 12 The core of the HarfBuzz shaping API is the function 13 <function>hb_shape()</function>. This function takes a font, a 14 buffer containing a string of Unicode codepoints and 15 (optionally) a list of font features as its input. It replaces 16 the codepoints in the buffer with the corresponding glyphs from 17 the font, correctly ordered and positioned, and with any of the 18 optional font features applied. 19 </para> 20 <para> 21 In addition to holding the pre-shaping input (the Unicode 22 codepoints that comprise the input string) and the post-shaping 23 output (the glyphs and positions), a HarfBuzz buffer has several 24 properties that affect shaping. The most important are the 25 text-flow direction (e.g., left-to-right, right-to-left, 26 top-to-bottom, or bottom-to-top), the script tag, and the 27 language tag. 28 </para> 29 30 <para> 31 For input string buffers, flags are available to denote when the 32 buffer represents the beginning or end of a paragraph, to 33 indicate whether or not to visibly render Unicode <literal>Default 34 Ignorable</literal> codepoints, and to modify the cluster-merging 35 behavior for the buffer. For shaped output buffers, the 36 individual X and Y offsets and <literal>advances</literal> 37 (the logical dimensions) of each glyph are 38 accessible. HarfBuzz also flags glyphs as 39 <literal>UNSAFE_TO_BREAK</literal> if breaking the string at 40 that glyph (e.g., in a line-breaking or hyphenation process) 41 would require re-shaping the text. 42 </para> 43 44 <para> 45 HarfBuzz also provides methods to compare the contents of 46 buffers, join buffers, normalize buffer contents, and handle 47 invalid codepoints, as well as to determine the state of a 48 buffer (e.g., input codepoints or output glyphs). Buffer 49 lifecycles are managed and all buffers are reference-counted. 50 </para> 51 52 <para> 53 Although the default <function>hb_shape()</function> function is 54 sufficient for most use cases, a variant is also provided that 55 lets you specify which of HarfBuzz's shapers to use on a buffer. 56 </para> 57 58 <para> 59 HarfBuzz can read TrueType fonts, TrueType collections, OpenType 60 fonts, and OpenType collections. Functions are provided to query 61 font objects about metrics, Unicode coverage, available tables and 62 features, and variation selectors. Individual glyphs can also be 63 queried for metrics, variations, and glyph names. OpenType 64 variable fonts are supported, and HarfBuzz allows you to set 65 variation-axis coordinates on font objects. 66 </para> 67 68 <para> 69 HarfBuzz provides glue code to integrate with various other 70 libraries, including FreeType, GObject, and CoreText. Support 71 for integrating with Uniscribe and DirectWrite is experimental 72 at present. 73 </para> 74 </section> 75 76 <section id="terminology"> 77 <title>Terminology</title> 78 <para> 79 80 </para> 81 <variablelist> 82 <?dbfo list-presentation="blocks"?> 83 <varlistentry> 84 <term>script</term> 85 <listitem> 86 <para> 87 In text shaping, a <emphasis>script</emphasis> is a 88 writing system: a set of symbols, rules, and conventions 89 that is used to represent a language or multiple 90 languages. 91 </para> 92 <para> 93 In general computing lingo, the word "script" can also 94 be used to mean an executable program (usually one 95 written in a human-readable programming language). For 96 the sake of clarity, HarfBuzz documents will always use 97 more specific terminology when referring to this 98 meaning, such as "Python script" or "shell script." In 99 all other instances, "script" refers to a writing system. 100 </para> 101 <para> 102 For developers using HarfBuzz, it is important to note 103 the distinction between a script and a language. Most 104 scripts are used to write a variety of different 105 languages, and many languages may be written in more 106 than one script. 107 </para> 108 </listitem> 109 </varlistentry> 110 111 <varlistentry> 112 <term>shaper</term> 113 <listitem> 114 <para> 115 In HarfBuzz, a <emphasis>shaper</emphasis> is a 116 handler for a specific script-shaping model. HarfBuzz 117 implements separate shapers for Indic, Arabic, Thai and 118 Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the 119 Universal Shaping Engine (USE), and a default shaper for 120 scripts with no script-specific shaping model. 121 </para> 122 </listitem> 123 </varlistentry> 124 125 <varlistentry> 126 <term>cluster</term> 127 <listitem> 128 <para> 129 In text shaping, a <emphasis>cluster</emphasis> is a 130 sequence of codepoints that must be treated as an 131 indivisible unit. Clusters can include code-point 132 sequences that form a ligature or base-and-mark 133 sequences. Tracking and preserving clusters is important 134 when shaping operations might separate or reorder 135 code points. 136 </para> 137 <para> 138 HarfBuzz provides three cluster 139 <emphasis>levels</emphasis> that implement different 140 approaches to the problem of preserving clusters during 141 shaping operations. 142 </para> 143 </listitem> 144 </varlistentry> 145 146 <varlistentry> 147 <term>grapheme</term> 148 <listitem> 149 <para> 150 In linguistics, a <emphasis>grapheme</emphasis> is one 151 of the indivisible units that make up a writing system or 152 script. Often, graphemes are individual symbols (letters, 153 numbers, punctuation marks, logograms, etc.) but, 154 depending on the writing system, a particular grapheme 155 might correspond to a sequence of several Unicode code 156 points. 157 </para> 158 <para> 159 In practice, HarfBuzz and other text-shaping engines 160 are not generally concerned with graphemes. However, it 161 is important for developers using HarfBuzz to recognize 162 that there is a difference between graphemes and shaping 163 clusters (see above). The two concepts may overlap 164 frequently, but there is no guarantee that they will be 165 identical. 166 </para> 167 </listitem> 168 </varlistentry> 169 170 <varlistentry> 171 <term>syllable</term> 172 <listitem> 173 <para> 174 In linguistics, a <emphasis>syllable</emphasis> is an 175 a sequence of sounds that makes up a building block of a 176 particular language. Every language has its own set of 177 rules describing what constitutes a valid syllable. 178 </para> 179 <para> 180 For text-shaping purposes, the various definitions of 181 "syllable" are important because script-specific shaping 182 operations may be applied at the syllable level. For 183 example, a reordering rule might specify that a vowel 184 mark be reordered to the beginning of the syllable. 185 </para> 186 <para> 187 Syllables will consist of one or more Unicode code 188 points. The definition of a syllable for a particular 189 writing system might correspond to how HarfBuzz 190 identifies clusters (see above) for the same writing 191 system. However, it is important for developers using 192 HarfBuzz to recognize that there is a difference between 193 syllables and shaping clusters. The two concepts may 194 overlap frequently, but there is no guarantee that they 195 will be identical. 196 </para> 197 </listitem> 198 </varlistentry> 199 </variablelist> 200 201 </section> 202 203 204 <section id="a-simple-shaping-example"> 205 <title>A simple shaping example</title> 206 207 <para> 208 Below is the simplest HarfBuzz shaping example possible. 209 </para> 210 <orderedlist numeration="arabic"> 211 <listitem> 212 <para> 213 Create a buffer and put your text in it. 214 </para> 215 </listitem> 216 </orderedlist> 217 <programlisting language="C"> 218 #include <hb.h> 219 220 hb_buffer_t *buf; 221 buf = hb_buffer_create(); 222 hb_buffer_add_utf8(buf, text, -1, 0, -1); 223 </programlisting> 224 <orderedlist numeration="arabic"> 225 <listitem override="2"> 226 <para> 227 Set the script, language and direction of the buffer. 228 </para> 229 </listitem> 230 </orderedlist> 231 <programlisting language="C"> 232 hb_buffer_set_direction(buf, HB_DIRECTION_LTR); 233 hb_buffer_set_script(buf, HB_SCRIPT_LATIN); 234 hb_buffer_set_language(buf, hb_language_from_string("en", -1)); 235 </programlisting> 236 <orderedlist numeration="arabic"> 237 <listitem override="3"> 238 <para> 239 Create a face and a font from a font file. 240 </para> 241 </listitem> 242 </orderedlist> 243 <programlisting language="C"> 244 hb_blob_t *blob = hb_blob_create_from_file(filename); /* or hb_blob_create_from_file_or_fail() */ 245 hb_face_t *face = hb_face_create(blob, 0); 246 hb_font_t *font = hb_font_create(face); 247 </programlisting> 248 <orderedlist numeration="arabic"> 249 <listitem override="4"> 250 <para> 251 Shape! 252 </para> 253 </listitem> 254 </orderedlist> 255 <programlisting> 256 hb_shape(font, buf, NULL, 0); 257 </programlisting> 258 <orderedlist numeration="arabic"> 259 <listitem override="5"> 260 <para> 261 Get the glyph and position information. 262 </para> 263 </listitem> 264 </orderedlist> 265 <programlisting language="C"> 266 unsigned int glyph_count; 267 hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); 268 hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); 269 </programlisting> 270 <orderedlist numeration="arabic"> 271 <listitem override="6"> 272 <para> 273 Iterate over each glyph. 274 </para> 275 </listitem> 276 </orderedlist> 277 <programlisting language="C"> 278 hb_position_t cursor_x = 0; 279 hb_position_t cursor_y = 0; 280 for (unsigned int i = 0; i < glyph_count; i++) { 281 hb_codepoint_t glyphid = glyph_info[i].codepoint; 282 hb_position_t x_offset = glyph_pos[i].x_offset; 283 hb_position_t y_offset = glyph_pos[i].y_offset; 284 hb_position_t x_advance = glyph_pos[i].x_advance; 285 hb_position_t y_advance = glyph_pos[i].y_advance; 286 /* draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); */ 287 cursor_x += x_advance; 288 cursor_y += y_advance; 289 } 290 </programlisting> 291 <orderedlist numeration="arabic"> 292 <listitem override="7"> 293 <para> 294 Tidy up. 295 </para> 296 </listitem> 297 </orderedlist> 298 <programlisting language="C"> 299 hb_buffer_destroy(buf); 300 hb_font_destroy(font); 301 hb_face_destroy(face); 302 hb_blob_destroy(blob); 303 </programlisting> 304 305 <para> 306 This example shows enough to get us started using HarfBuzz. In 307 the sections that follow, we will use the remainder of 308 HarfBuzz's API to refine and extend the example and improve its 309 text-shaping capabilities. 310 </para> 311 </section> 312</chapter> 313