• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!-- ##### SECTION Title ##### -->
2Character Set Conversion
3
4<!-- ##### SECTION Short_Description ##### -->
5convert strings between different character sets using iconv()
6
7<!-- ##### SECTION Long_Description ##### -->
8<para>
9
10</para>
11
12    <refsect2 id="file-name-encodings">
13      <title>File Name Encodings</title>
14
15      <para>
16	Historically, Unix has not had a defined encoding for file
17	names:  a file name is valid as long as it does not have path
18	separators in it ("/").  However, displaying file names may
19	require conversion:  from the character set in which they were
20	created, to the character set in which the application
21	operates.  Consider the Spanish file name
22	"<filename>Presentaci&oacute;n.sxi</filename>".  If the
23	application which created it uses ISO-8859-1 for its encoding,
24	then the actual file name on disk would look like this:
25      </para>
26
27      <programlisting id="filename-iso8859-1">
28Character:  P  r  e  s  e  n  t  a  c  i  &oacute;  n  .  s  x  i
29Hex code:   50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69
30      </programlisting>
31
32      <para>
33	However, if the application use UTF-8, the actual file name on
34	disk would look like this:
35      </para>
36
37      <programlisting id="filename-utf-8">
38Character:  P  r  e  s  e  n  t  a  c  i  &oacute;     n  .  s  x  i
39Hex code:   50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69
40      </programlisting>
41
42      <para>
43	Glib uses UTF-8 for its strings, and GUI toolkits like GTK+
44	that use Glib do the same thing.  If you get a file name from
45	the file system, for example, from
46	<function>readdir(3)</function> or from <link
47	linkend="g_dir_read_name"><function>g_dir_read_name()</function></link>,
48	and you wish to display the file name to the user, you
49	<emphasis>will</emphasis> need to convert it into UTF-8.  The
50	opposite case is when the user types the name of a file he
51	wishes to save:  the toolkit will give you that string in
52	UTF-8 encoding, and you will need to convert it to the
53	character set used for file names before you can create the
54	file with <function>open(2)</function> or
55	<function>fopen(3)</function>.
56      </para>
57
58      <para>
59	By default, Glib assumes that file names on disk are in UTF-8
60	encoding.  This is a valid assumption for file systems which
61	were created relatively recently:  most applications use UTF-8
62	encoding for their strings, and that is also what they use for
63	the file names they create.  However, older file systems may
64	still contain file names created in "older" encodings, such as
65	ISO-8859-1.  In this case, for compatibility reasons, you may
66	want to instruct Glib to use that particular encoding for file
67	names rather than UTF-8.  You can do this by specifying the
68	encoding for file names in the <link
69	linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link>
70	environment variable.  For example, if your installation uses
71	ISO-8859-1 for file names, you can put this in your
72	<filename>~/.profile</filename>:
73      </para>
74
75      <programlisting>
76export G_FILENAME_ENCODING=ISO-8859-1
77      </programlisting>
78
79      <para>
80	Glib provides the functions <link
81	linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>
82	and <link
83	linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>
84	to perform the necessary conversions.  These functions convert
85	file names from the encoding specified in
86	<envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa.
87	<xref linkend="file-name-encodings-diagram"/> illustrates how
88	these functions are used to convert between UTF-8 and the
89	encoding for file names in the file system.
90      </para>
91
92      <figure id="file-name-encodings-diagram">
93	<title>Conversion between File Name Encodings</title>
94	<graphic fileref="file-name-encodings.png" format="PNG"/>
95      </figure>
96
97      <refsect3 id="file-name-encodings-checklist">
98	<title>Checklist for Application Writers</title>
99
100	<para>
101	  This section is a practical summary of the detailed
102	  description above.  You can use this as a checklist of
103	  things to do to make sure your applications process file
104	  name encodings correctly.
105	</para>
106
107	<orderedlist>
108	  <listitem>
109	    <para>
110	      If you get a file name from the file system from a
111	      function such as <function>readdir(3)</function> or
112	      <function>gtk_file_chooser_get_filename()</function>,
113	      you do not need to do any conversion to pass that
114	      file name to functions like <function>open(2)</function>,
115	      <function>rename(2)</function>, or
116	      <function>fopen(3)</function> &mdash; those are "raw"
117	      file names which the file system understands.
118	    </para>
119	  </listitem>
120
121	  <listitem>
122	    <para>
123	      If you need to display a file name, convert it to UTF-8
124	      first by using <link
125	      linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>.
126	      If conversion fails, display a string like
127	      "<literal>Unknown file name</literal>".  <emphasis>Do
128	      not</emphasis> convert this string back into the
129	      encoding used for file names if you wish to pass it to
130	      the file system; use the original file name instead.
131	      For example, the document window of a word processor
132	      could display "Unknown file name" in its title bar but
133	      still let the user save the file, as it would keep the
134	      raw file name internally.  This can happen if the user
135	      has not set the <envar>G_FILENAME_ENCODING</envar>
136	      environment variable even though he has files whose
137	      names are not encoded in UTF-8.
138	    </para>
139	  </listitem>
140
141	  <listitem>
142	    <para>
143	      If your user interface lets the user type a file name
144	      for saving or renaming, convert it to the encoding used
145	      for file names in the file system by using <link
146	      linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>.
147	      Pass the converted file name to functions like
148	      <function>fopen(3)</function>.  If conversion fails, ask
149	      the user to enter a different file name.  This can
150	      happen if the user types Japanese characters when
151	      <envar>G_FILENAME_ENCODING</envar> is set to
152	      <literal>ISO-8859-1</literal>, for example.
153	    </para>
154	  </listitem>
155	</orderedlist>
156      </refsect3>
157    </refsect2>
158
159<!-- ##### SECTION See_Also ##### -->
160<para>
161
162</para>
163
164<!-- ##### SECTION Stability_Level ##### -->
165
166
167<!-- ##### FUNCTION g_convert ##### -->
168<para>
169
170</para>
171
172@str:
173@len:
174@to_codeset:
175@from_codeset:
176@bytes_read:
177@bytes_written:
178@error:
179@Returns:
180
181
182<!-- ##### FUNCTION g_convert_with_fallback ##### -->
183<para>
184
185</para>
186
187@str:
188@len:
189@to_codeset:
190@from_codeset:
191@fallback:
192@bytes_read:
193@bytes_written:
194@error:
195@Returns:
196
197
198<!-- ##### STRUCT GIConv ##### -->
199<para>
200The <structname>GIConv</structname> struct wraps an
201<function>iconv()</function> conversion descriptor. It contains private data
202and should only be accessed using the following functions.
203</para>
204
205
206<!-- ##### FUNCTION g_convert_with_iconv ##### -->
207<para>
208
209</para>
210
211@str:
212@len:
213@converter:
214@bytes_read:
215@bytes_written:
216@error:
217@Returns:
218
219
220<!-- ##### MACRO G_CONVERT_ERROR ##### -->
221<para>
222Error domain for character set conversions. Errors in this domain will
223be from the #GConvertError enumeration. See #GError for information on
224error domains.
225</para>
226
227
228
229<!-- ##### FUNCTION g_iconv_open ##### -->
230<para>
231
232</para>
233
234@to_codeset:
235@from_codeset:
236@Returns:
237
238
239<!-- ##### FUNCTION g_iconv ##### -->
240<para>
241
242</para>
243
244@converter:
245@inbuf:
246@inbytes_left:
247@outbuf:
248@outbytes_left:
249@Returns:
250
251
252<!-- ##### FUNCTION g_iconv_close ##### -->
253<para>
254
255</para>
256
257@converter:
258@Returns:
259
260
261<!-- ##### FUNCTION g_locale_to_utf8 ##### -->
262<para>
263
264</para>
265
266@opsysstring:
267@len:
268@bytes_read:
269@bytes_written:
270@error:
271@Returns:
272
273
274<!-- ##### FUNCTION g_filename_to_utf8 ##### -->
275<para>
276
277</para>
278
279@opsysstring:
280@len:
281@bytes_read:
282@bytes_written:
283@error:
284@Returns:
285
286
287<!-- ##### FUNCTION g_filename_from_utf8 ##### -->
288<para>
289
290</para>
291
292@utf8string:
293@len:
294@bytes_read:
295@bytes_written:
296@error:
297@Returns:
298
299
300<!-- ##### FUNCTION g_filename_from_uri ##### -->
301<para>
302
303</para>
304
305@uri:
306@hostname:
307@error:
308@Returns:
309
310
311<!-- ##### FUNCTION g_filename_to_uri ##### -->
312<para>
313
314</para>
315
316@filename:
317@hostname:
318@error:
319@Returns:
320
321
322<!-- ##### FUNCTION g_get_filename_charsets ##### -->
323<para>
324
325</para>
326
327@charsets:
328@Returns:
329
330
331<!-- ##### FUNCTION g_filename_display_name ##### -->
332<para>
333
334</para>
335
336@filename:
337@Returns:
338
339
340<!-- ##### FUNCTION g_filename_display_basename ##### -->
341<para>
342
343</para>
344
345@filename:
346@Returns:
347
348
349<!-- ##### FUNCTION g_uri_list_extract_uris ##### -->
350<para>
351
352</para>
353
354@uri_list:
355@Returns:
356
357
358<!-- ##### FUNCTION g_locale_from_utf8 ##### -->
359<para>
360
361</para>
362
363@utf8string:
364@len:
365@bytes_read:
366@bytes_written:
367@error:
368@Returns:
369
370
371<!-- ##### ENUM GConvertError ##### -->
372<para>
373Error codes returned by character set conversion routines.
374</para>
375
376@G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character sets
377is not supported.
378@G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
379@G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
380@G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
381@G_CONVERT_ERROR_BAD_URI: URI is invalid.
382@G_CONVERT_ERROR_NOT_ABSOLUTE_PATH: Pathname is not an absolute path.
383
384<!-- ##### FUNCTION g_get_charset ##### -->
385<para>
386
387</para>
388
389@charset:
390@Returns:
391
392
393<!--
394Local variables:
395mode: sgml
396sgml-parent-document: ("../glib-docs.sgml" "book" "refentry")
397End:
398-->
399
400
401