• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
5  <!ENTITY version SYSTEM "version.xml">
6]>
7<chapter id="getting-started">
8  <title>Getting started with HarfBuzz</title>
9  <section id="an-overview-of-the-harfbuzz-shaping-api">
10    <title>An overview of the HarfBuzz shaping API</title>
11    <para>
12      The core of the HarfBuzz shaping API is the function
13      <function>hb_shape()</function>. This function takes a font, a
14      buffer containing a string of Unicode codepoints and
15      (optionally) a list of font features as its input. It replaces
16      the codepoints in the buffer with the corresponding glyphs from
17      the font, correctly ordered and positioned, and with any of the
18      optional font features applied.
19    </para>
20    <para>
21      In addition to holding the pre-shaping input (the Unicode
22      codepoints that comprise the input string) and the post-shaping
23      output (the glyphs and positions), a HarfBuzz buffer has several
24      properties that affect shaping. The most important are the
25      text-flow direction (e.g., left-to-right, right-to-left,
26      top-to-bottom, or bottom-to-top), the script tag, and the
27      language tag.
28    </para>
29
30    <para>
31      For input string buffers, flags are available to denote when the
32      buffer represents the beginning or end of a paragraph, to
33      indicate whether or not to visibly render Unicode <literal>Default
34      Ignorable</literal> codepoints, and to modify the cluster-merging
35      behavior for the buffer. For shaped output buffers, the
36      individual X and Y offsets and <literal>advances</literal>
37      (the logical dimensions) of each glyph are
38      accessible. HarfBuzz also flags glyphs as
39      <literal>UNSAFE_TO_BREAK</literal> if breaking the string at
40      that glyph (e.g., in a line-breaking or hyphenation process)
41      would require re-shaping the text.
42    </para>
43
44    <para>
45      HarfBuzz also provides methods to compare the contents of
46      buffers, join buffers, normalize buffer contents, and handle
47      invalid codepoints, as well as to determine the state of a
48      buffer (e.g., input codepoints or output glyphs). Buffer
49      lifecycles are managed and all buffers are reference-counted.
50    </para>
51
52    <para>
53      Although the default <function>hb_shape()</function> function is
54      sufficient for most use cases, a variant is also provide that
55      lets you specify which of HarfBuzz's shapers to use on a buffer.
56    </para>
57
58    <para>
59      HarfBuzz can read TrueType fonts, TrueType collections, OpenType
60      fonts, and OpenType collections. Functions are provided to query
61      font objects about metrics, Unicode coverage, available tables and
62      features, and variation selectors. Individual glyphs can also be
63      queried for metrics, variations, and glyph names. OpenType
64      variable fonts are supported, and HarfBuzz allows you to set
65      variation-axis coordinates on font objects.
66    </para>
67
68    <para>
69      HarfBuzz provides glue code to integrate with various other
70      libraries, including FreeType, GObject, and CoreText. Support
71      for integrating with Uniscribe and DirectWrite is experimental
72      at present.
73    </para>
74  </section>
75
76  <section id="terminology">
77    <title>Terminology</title>
78    <para>
79
80    </para>
81      <variablelist>
82	<?dbfo list-presentation="blocks"?>
83	<varlistentry>
84	  <term>script</term>
85	  <listitem>
86	    <para>
87	      In text shaping, a <emphasis>script</emphasis> is a
88	      writing system: a set of symbols, rules, and conventions
89	      that is used to represent a language or multiple
90	      languages.
91	    </para>
92	    <para>
93	      In general computing lingo, the word "script" can also
94	      be used to mean an executable program (usually one
95	      written in a human-readable programming language). For
96	      the sake of clarity, HarfBuzz documents will always use
97	      more specific terminology when referring to this
98	      meaning, such as "Python script" or "shell script." In
99	      all other instances, "script" refers to a writing system.
100	    </para>
101	    <para>
102	      For developers using HarfBuzz, it is important to note
103	      the distinction between a script and a language. Most
104	      scripts are used to write a variety of different
105	      languages, and many languages may be written in more
106	      than one script.
107	    </para>
108	  </listitem>
109	</varlistentry>
110
111	<varlistentry>
112	  <term>shaper</term>
113	  <listitem>
114	    <para>
115	      In HarfBuzz, a <emphasis>shaper</emphasis> is a
116	      handler for a specific script-shaping model. HarfBuzz
117	      implements separate shapers for Indic, Arabic, Thai and
118	      Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the
119	      Universal Shaping Engine (USE), and a default shaper for
120	      non-complex scripts.
121	    </para>
122	  </listitem>
123	</varlistentry>
124
125	<varlistentry>
126	  <term>cluster</term>
127	  <listitem>
128	    <para>
129	      In text shaping, a <emphasis>cluster</emphasis> is a
130	      sequence of codepoints that must be treated as an
131	      indivisible unit. Clusters can include code-point
132	      sequences that form a ligature or base-and-mark
133	      sequences. Tracking and preserving clusters is important
134	      when shaping operations might separate or reorder
135	      code points.
136	    </para>
137	    <para>
138	      HarfBuzz provides three cluster
139	      <emphasis>levels</emphasis> that implement different
140	      approaches to the problem of preserving clusters during
141	      shaping operations.
142	    </para>
143	  </listitem>
144	</varlistentry>
145
146	<varlistentry>
147	  <term>grapheme</term>
148	  <listitem>
149	    <para>
150	      In linguistics, a <emphasis>grapheme</emphasis> is one
151	      of the indivisible units that make up a writing system or
152	      script. Often, graphemes are individual symbols (letters,
153	      numbers, punctuation marks, logograms, etc.) but,
154	      depending on the writing system, a particular grapheme
155	      might correspond to a sequence of several Unicode code
156	      points.
157	    </para>
158	    <para>
159	      In practice, HarfBuzz and other text-shaping engines
160	      are not generally concerned with graphemes. However, it
161	      is important for developers using HarfBuzz to recognize
162	      that there is a difference between graphemes and shaping
163	      clusters (see above). The two concepts may overlap
164	      frequently, but there is no guarantee that they will be
165	      identical.
166	    </para>
167	  </listitem>
168	</varlistentry>
169
170	<varlistentry>
171	  <term>syllable</term>
172	  <listitem>
173	    <para>
174	      In linguistics, a <emphasis>syllable</emphasis> is an
175	      a sequence of sounds that makes up a building block of a
176	      particular language. Every language has its own set of
177	      rules describing what constitutes a valid syllable.
178	    </para>
179	    <para>
180	      For text-shaping purposes, the various definitions of
181	      "syllable" are important because script-specific shaping
182	      operations may be applied at the syllable level. For
183	      example, a reordering rule might specify that a vowel
184	      mark be reordered to the beginning of the syllable.
185	    </para>
186	    <para>
187	      Syllables will consist of one or more Unicode code
188	      points. The definition of a syllable for a particular
189	      writing system might correspond to how HarfBuzz
190	      identifies clusters (see above) for the same writing
191	      system. However, it is important for developers using
192	      HarfBuzz to recognize that there is a difference between
193	      syllables and shaping clusters. The two concepts may
194	      overlap frequently, but there is no guarantee that they
195	      will be identical.
196	    </para>
197	  </listitem>
198	</varlistentry>
199      </variablelist>
200
201  </section>
202
203
204  <section id="a-simple-shaping-example">
205    <title>A simple shaping example</title>
206
207    <para>
208      Below is the simplest HarfBuzz shaping example possible.
209    </para>
210    <orderedlist numeration="arabic">
211      <listitem>
212	<para>
213          Create a buffer and put your text in it.
214	</para>
215      </listitem>
216    </orderedlist>
217    <programlisting language="C">
218      #include &lt;hb.h&gt;
219
220      hb_buffer_t *buf;
221      buf = hb_buffer_create();
222      hb_buffer_add_utf8(buf, text, -1, 0, -1);
223    </programlisting>
224    <orderedlist numeration="arabic">
225      <listitem override="2">
226	<para>
227          Set the script, language and direction of the buffer.
228	</para>
229      </listitem>
230    </orderedlist>
231    <programlisting language="C">
232      hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
233      hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
234      hb_buffer_set_language(buf, hb_language_from_string("en", -1));
235    </programlisting>
236    <orderedlist numeration="arabic">
237      <listitem override="3">
238	<para>
239          Create a face and a font from a font file.
240	</para>
241      </listitem>
242    </orderedlist>
243    <programlisting language="C">
244      hb_blob_t *blob = hb_blob_create_from_file(filename);
245      hb_face_t *face = hb_face_create(blob, 0);
246      hb_font_t *font = hb_font_create(face);
247    </programlisting>
248    <orderedlist numeration="arabic">
249      <listitem override="4">
250	<para>
251          Shape!
252	</para>
253      </listitem>
254    </orderedlist>
255    <programlisting>
256      hb_shape(font, buf, NULL, 0);
257    </programlisting>
258    <orderedlist numeration="arabic">
259      <listitem override="5">
260	<para>
261          Get the glyph and position information.
262	</para>
263      </listitem>
264    </orderedlist>
265    <programlisting language="C">
266      unsigned int glyph_count;
267      hb_glyph_info_t *glyph_info    = hb_buffer_get_glyph_infos(buf, &amp;glyph_count);
268      hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &amp;glyph_count);
269    </programlisting>
270    <orderedlist numeration="arabic">
271      <listitem override="6">
272	<para>
273          Iterate over each glyph.
274	</para>
275      </listitem>
276    </orderedlist>
277    <programlisting language="C">
278      hb_position_t cursor_x = 0;
279      hb_position_t cursor_y = 0;
280      for (unsigned int i = 0; i &lt; glyph_count; i++) {
281          hb_codepoint_t glyphid  = glyph_info[i].codepoint;
282          hb_position_t x_offset  = glyph_pos[i].x_offset;
283          hb_position_t y_offset  = glyph_pos[i].y_offset;
284          hb_position_t x_advance = glyph_pos[i].x_advance;
285          hb_position_t y_advance = glyph_pos[i].y_advance;
286       /* draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); */
287          cursor_x += x_advance;
288          cursor_y += y_advance;
289      }
290    </programlisting>
291    <orderedlist numeration="arabic">
292      <listitem override="7">
293	<para>
294          Tidy up.
295	</para>
296      </listitem>
297    </orderedlist>
298    <programlisting language="C">
299      hb_buffer_destroy(buf);
300      hb_font_destroy(font);
301      hb_face_destroy(face);
302      hb_blob_destroy(blob);
303    </programlisting>
304
305    <para>
306      This example shows enough to get us started using HarfBuzz. In
307      the sections that follow, we will use the remainder of
308      HarfBuzz's API to refine and extend the example and improve its
309      text-shaping capabilities.
310    </para>
311  </section>
312</chapter>
313