• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1\input texinfo          @c -*-texinfo-*-
2@c %**start of header
3@setfilename gettext.info
4@c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo
5@c for info and html output, but to false in texi2html.
6@ifnottex
7@ifclear texi2html
8@set makeinfo
9@end ifclear
10@end ifnottex
11@c The @documentencoding is needed for makeinfo; texi2html 1.52
12@c doesn't recognize it.
13@ifset makeinfo
14@documentencoding UTF-8
15@end ifset
16@settitle GNU @code{gettext} utilities
17@finalout
18@c Indices:
19@c   am = autoconf macro  @amindex
20@c   cp = concept         @cindex
21@c   ef = emacs function  @efindex
22@c   em = emacs mode      @emindex
23@c   ev = emacs variable  @evindex
24@c   fn = function        @findex
25@c   kw = keyword         @kwindex
26@c   op = option          @opindex
27@c   pg = program         @pindex
28@c   vr = variable        @vindex
29@c Unused predefined indices:
30@c   tp = type            @tindex
31@c   ky = keystroke       @kindex
32@defcodeindex am
33@defcodeindex ef
34@defindex em
35@defcodeindex ev
36@defcodeindex kw
37@defcodeindex op
38@syncodeindex ef em
39@syncodeindex ev em
40@syncodeindex fn cp
41@syncodeindex kw cp
42@ifclear texi2html
43@firstparagraphindent insert
44@end ifclear
45@c %**end of header
46
47@include version.texi
48
49@ifinfo
50@dircategory GNU Gettext Utilities
51@direntry
52* gettext: (gettext).                          GNU gettext utilities.
53* autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
54* envsubst: (gettext)envsubst Invocation.      Expand environment variables.
55* gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
56* msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
57* msgcat: (gettext)msgcat Invocation.          Combine several PO files.
58* msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
59* msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
60* msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
61* msgen: (gettext)msgen Invocation.            Create an English PO file.
62* msgexec: (gettext)msgexec Invocation.        Process a PO file.
63* msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
64* msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
65* msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
66* msginit: (gettext)msginit Invocation.        Create a fresh PO file.
67* msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
68* msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
69* msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
70* ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
71* xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.
72* ISO639: (gettext)Language Codes.             ISO 639 language codes.
73* ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
74@end direntry
75@end ifinfo
76
77@ifinfo
78This file provides documentation for GNU @code{gettext} utilities.
79It also serves as a reference for the free Translation Project.
80
81@copying
82Copyright (C) 1995-1998, 2001-2020 Free Software Foundation, Inc.
83
84This manual is free documentation.  It is dually licensed under the
85GNU FDL and the GNU GPL.  This means that you can redistribute this
86manual under either of these two licenses, at your choice.
87
88This manual is covered by the GNU FDL.  Permission is granted to copy,
89distribute and/or modify this document under the terms of the
90GNU Free Documentation License (FDL), either version 1.2 of the
91License, or (at your option) any later version published by the
92Free Software Foundation (FSF); with no Invariant Sections, with no
93Front-Cover Text, and with no Back-Cover Texts.
94A copy of the license is included in @ref{GNU FDL}.
95
96This manual is covered by the GNU GPL.  You can redistribute it and/or
97modify it under the terms of the GNU General Public License (GPL), either
98version 2 of the License, or (at your option) any later version published
99by the Free Software Foundation (FSF).
100A copy of the license is included in @ref{GNU GPL}.
101@end copying
102@end ifinfo
103
104@titlepage
105@title GNU gettext tools, version @value{VERSION}
106@subtitle Native Language Support Library and Tools
107@subtitle Edition @value{EDITION}, @value{UPDATED}
108@author Ulrich Drepper
109@author Jim Meyering
110@author Fran@,{c}ois Pinard
111@author Bruno Haible
112
113@ifnothtml
114@page
115@vskip 0pt plus 1filll
116@c @insertcopying
117Copyright (C) 1995-1998, 2001-2020 Free Software Foundation, Inc.
118
119This manual is free documentation.  It is dually licensed under the
120GNU FDL and the GNU GPL.  This means that you can redistribute this
121manual under either of these two licenses, at your choice.
122
123This manual is covered by the GNU FDL.  Permission is granted to copy,
124distribute and/or modify this document under the terms of the
125GNU Free Documentation License (FDL), either version 1.2 of the
126License, or (at your option) any later version published by the
127Free Software Foundation (FSF); with no Invariant Sections, with no
128Front-Cover Text, and with no Back-Cover Texts.
129A copy of the license is included in @ref{GNU FDL}.
130
131This manual is covered by the GNU GPL.  You can redistribute it and/or
132modify it under the terms of the GNU General Public License (GPL), either
133version 2 of the License, or (at your option) any later version published
134by the Free Software Foundation (FSF).
135A copy of the license is included in @ref{GNU GPL}.
136@end ifnothtml
137@end titlepage
138
139@c Table of Contents
140@contents
141
142@ifnottex
143@node Top
144@top GNU @code{gettext} utilities
145
146This manual documents the GNU gettext tools and the GNU libintl library,
147version @value{VERSION}.
148
149@menu
150* Introduction::                Introduction
151* Users::                       The User's View
152* PO Files::                    The Format of PO Files
153* Sources::                     Preparing Program Sources
154* Template::                    Making the PO Template File
155* Creating::                    Creating a New PO File
156* Updating::                    Updating Existing PO Files
157* Editing::                     Editing PO Files
158* Manipulating::                Manipulating PO Files
159* Binaries::                    Producing Binary MO Files
160* Programmers::                 The Programmer's View
161* Translators::                 The Translator's View
162* Maintainers::                 The Maintainer's View
163* Installers::                  The Installer's and Distributor's View
164* Programming Languages::       Other Programming Languages
165* Data Formats::                Other Data Formats
166* Conclusion::                  Concluding Remarks
167
168* Language Codes::              ISO 639 language codes
169* Country Codes::               ISO 3166 country codes
170* Licenses::                    Licenses
171
172* Program Index::               Index of Programs
173* Option Index::                Index of Command-Line Options
174* Variable Index::              Index of Environment Variables
175* PO Mode Index::               Index of Emacs PO Mode Commands
176* Autoconf Macro Index::        Index of Autoconf Macros
177* Index::                       General Index
178
179@detailmenu
180 --- The Detailed Node Listing ---
181
182Introduction
183
184* Why::                         The Purpose of GNU @code{gettext}
185* Concepts::                    I18n, L10n, and Such
186* Aspects::                     Aspects in Native Language Support
187* Files::                       Files Conveying Translations
188* Overview::                    Overview of GNU @code{gettext}
189
190The User's View
191
192* System Installation::         Questions During Operating System Installation
193* Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
194* Setting the POSIX Locale::    How to Specify the Locale According to POSIX
195* Working in a Windows console::  Obtaining good output in a Windows console
196* Installing Localizations::    How to Install Additional Translations
197
198Setting the Locale through Environment Variables
199
200* Locale Names::                How a Locale Specification Looks Like
201* Locale Environment Variables::  Which Environment Variable Specfies What
202* The LANGUAGE variable::       How to Specify a Priority List of Languages
203
204Preparing Program Sources
205
206* Importing::                   Importing the @code{gettext} declaration
207* Triggering::                  Triggering @code{gettext} Operations
208* Preparing Strings::           Preparing Translatable Strings
209* Mark Keywords::               How Marks Appear in Sources
210* Marking::                     Marking Translatable Strings
211* c-format Flag::               Telling something about the following string
212* Special cases::               Special Cases of Translatable Strings
213* Bug Report Address::          Letting Users Report Translation Bugs
214* Names::                       Marking Proper Names for Translation
215* Libraries::                   Preparing Library Sources
216
217Making the PO Template File
218
219* xgettext Invocation::         Invoking the @code{xgettext} Program
220
221Creating a New PO File
222
223* msginit Invocation::          Invoking the @code{msginit} Program
224* Header Entry::                Filling in the Header Entry
225
226Updating Existing PO Files
227
228* msgmerge Invocation::         Invoking the @code{msgmerge} Program
229
230Editing PO Files
231
232* KBabel::                      KDE's PO File Editor
233* Gtranslator::                 GNOME's PO File Editor
234* PO Mode::                     Emacs's PO File Editor
235* Compendium::                  Using Translation Compendia
236
237Emacs's PO File Editor
238
239* Installation::                Completing GNU @code{gettext} Installation
240* Main PO Commands::            Main Commands
241* Entry Positioning::           Entry Positioning
242* Normalizing::                 Normalizing Strings in Entries
243* Translated Entries::          Translated Entries
244* Fuzzy Entries::               Fuzzy Entries
245* Untranslated Entries::        Untranslated Entries
246* Obsolete Entries::            Obsolete Entries
247* Modifying Translations::      Modifying Translations
248* Modifying Comments::          Modifying Comments
249* Subedit::                     Mode for Editing Translations
250* C Sources Context::           C Sources Context
251* Auxiliary::                   Consulting Auxiliary PO Files
252
253Using Translation Compendia
254
255* Creating Compendia::          Merging translations for later use
256* Using Compendia::             Using older translations if they fit
257
258Manipulating PO Files
259
260* msgcat Invocation::           Invoking the @code{msgcat} Program
261* msgconv Invocation::          Invoking the @code{msgconv} Program
262* msggrep Invocation::          Invoking the @code{msggrep} Program
263* msgfilter Invocation::        Invoking the @code{msgfilter} Program
264* msguniq Invocation::          Invoking the @code{msguniq} Program
265* msgcomm Invocation::          Invoking the @code{msgcomm} Program
266* msgcmp Invocation::           Invoking the @code{msgcmp} Program
267* msgattrib Invocation::        Invoking the @code{msgattrib} Program
268* msgen Invocation::            Invoking the @code{msgen} Program
269* msgexec Invocation::          Invoking the @code{msgexec} Program
270* Colorizing::                  Highlighting parts of PO files
271* Other tools::                 Other tools for manipulating PO files
272* libgettextpo::                Writing your own programs that process PO files
273
274Highlighting parts of PO files
275
276* The --color option::          Triggering colorized output
277* The TERM variable::           The environment variable @code{TERM}
278* The --style option::          The @code{--style} option
279* Style rules::                 Style rules for PO files
280* Customizing less::            Customizing @code{less} for viewing PO files
281
282Producing Binary MO Files
283
284* msgfmt Invocation::           Invoking the @code{msgfmt} Program
285* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
286* MO Files::                    The Format of GNU MO Files
287
288The Programmer's View
289
290* catgets::                     About @code{catgets}
291* gettext::                     About @code{gettext}
292* Comparison::                  Comparing the two interfaces
293* Using libintl.a::             Using libintl.a in own programs
294* gettext grok::                Being a @code{gettext} grok
295* Temp Programmers::            Temporary Notes for the Programmers Chapter
296
297About @code{catgets}
298
299* Interface to catgets::        The interface
300* Problems with catgets::       Problems with the @code{catgets} interface?!
301
302About @code{gettext}
303
304* Interface to gettext::        The interface
305* Ambiguities::                 Solving ambiguities
306* Locating Catalogs::           Locating message catalog files
307* Charset conversion::          How to request conversion to Unicode
308* Contexts::                    Solving ambiguities in GUI programs
309* Plural forms::                Additional functions for handling plurals
310* Optimized gettext::           Optimization of the *gettext functions
311
312Temporary Notes for the Programmers Chapter
313
314* Temp Implementations::        Temporary - Two Possible Implementations
315* Temp catgets::                Temporary - About @code{catgets}
316* Temp WSI::                    Temporary - Why a single implementation
317* Temp Notes::                  Temporary - Notes
318
319The Translator's View
320
321* Trans Intro 0::               Introduction 0
322* Trans Intro 1::               Introduction 1
323* Discussions::                 Discussions
324* Organization::                Organization
325* Information Flow::            Information Flow
326* Translating plural forms::    How to fill in @code{msgstr[0]}, @code{msgstr[1]}
327* Prioritizing messages::       How to find which messages to translate first
328
329Organization
330
331* Central Coordination::        Central Coordination
332* National Teams::              National Teams
333* Mailing Lists::               Mailing Lists
334
335National Teams
336
337* Sub-Cultures::                Sub-Cultures
338* Organizational Ideas::        Organizational Ideas
339
340The Maintainer's View
341
342* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
343* Prerequisites::               Prerequisite Works
344* gettextize Invocation::       Invoking the @code{gettextize} Program
345* Adjusting Files::             Files You Must Create or Alter
346* autoconf macros::             Autoconf macros for use in @file{configure.ac}
347* Version Control Issues::
348* Release Management::          Creating a Distribution Tarball
349
350Files You Must Create or Alter
351
352* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
353* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
354* po/Makevars::                 @file{Makevars} in @file{po/}
355* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
356* configure.ac::                @file{configure.ac} at top level
357* config.guess::                @file{config.guess}, @file{config.sub} at top level
358* mkinstalldirs::               @file{mkinstalldirs} at top level
359* aclocal::                     @file{aclocal.m4} at top level
360* config.h.in::                 @file{config.h.in} at top level
361* Makefile::                    @file{Makefile.in} at top level
362* src/Makefile::                @file{Makefile.in} in @file{src/}
363* lib/gettext.h::               @file{gettext.h} in @file{lib/}
364
365Autoconf macros for use in @file{configure.ac}
366
367* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
368* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
369* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
370* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
371* AM_XGETTEXT_OPTION::          AM_XGETTEXT_OPTION in @file{po.m4}
372* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
373
374Integrating with Version Control Systems
375
376* Distributed Development::     Avoiding version mismatch in distributed development
377* Files under Version Control::  Files to put under version control
378* Translations under Version Control::  Put PO Files under Version Control
379* autopoint Invocation::        Invoking the @code{autopoint} Program
380
381Other Programming Languages
382
383* Language Implementors::       The Language Implementor's View
384* Programmers for other Languages::  The Programmer's View
385* Translators for other Languages::  The Translator's View
386* Maintainers for other Languages::  The Maintainer's View
387* List of Programming Languages::  Individual Programming Languages
388
389The Translator's View
390
391* c-format::                    C Format Strings
392* objc-format::                 Objective C Format Strings
393* python-format::               Python Format Strings
394* java-format::                 Java Format Strings
395* csharp-format::               C# Format Strings
396* javascript-format::           JavaScript Format Strings
397* scheme-format::               Scheme Format Strings
398* lisp-format::                 Lisp Format Strings
399* elisp-format::                Emacs Lisp Format Strings
400* librep-format::               librep Format Strings
401* ruby-format::                 Ruby Format Strings
402* sh-format::                   Shell Format Strings
403* awk-format::                  awk Format Strings
404* lua-format::                  Lua Format Strings
405* object-pascal-format::        Object Pascal Format Strings
406* smalltalk-format::            Smalltalk Format Strings
407* qt-format::                   Qt Format Strings
408* qt-plural-format::            Qt Plural Format Strings
409* kde-format::                  KDE Format Strings
410* kde-kuit-format::             KUIT Format Strings
411* boost-format::                Boost Format Strings
412* tcl-format::                  Tcl Format Strings
413* perl-format::                 Perl Format Strings
414* php-format::                  PHP Format Strings
415* gcc-internal-format::         GCC internal Format Strings
416* gfc-internal-format::         GFC internal Format Strings
417* ycp-format::                  YCP Format Strings
418
419Individual Programming Languages
420
421* C::                           C, C++, Objective C
422* Python::                      Python
423* Java::                        Java
424* C#::                          C#
425* JavaScript::                  JavaScript
426* Scheme::                      GNU guile - Scheme
427* Common Lisp::                 GNU clisp - Common Lisp
428* clisp C::                     GNU clisp C sources
429* Emacs Lisp::                  Emacs Lisp
430* librep::                      librep
431* Ruby::                        Ruby
432* sh::                          sh - Shell Script
433* bash::                        bash - Bourne-Again Shell Script
434* gawk::                        GNU awk
435* Lua::                         Lua
436* Pascal::                      Pascal - Free Pascal Compiler
437* Smalltalk::                   GNU Smalltalk
438* Vala::                        Vala
439* wxWidgets::                   wxWidgets library
440* Tcl::                         Tcl - Tk's scripting language
441* Perl::                        Perl
442* PHP::                         PHP Hypertext Preprocessor
443* Pike::                        Pike
444* GCC-source::                  GNU Compiler Collection sources
445* YCP::                         YCP - YaST2 scripting language
446
447sh - Shell Script
448
449* Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
450* gettext.sh::                  Contents of @code{gettext.sh}
451* gettext Invocation::          Invoking the @code{gettext} program
452* ngettext Invocation::         Invoking the @code{ngettext} program
453* envsubst Invocation::         Invoking the @code{envsubst} program
454* eval_gettext Invocation::     Invoking the @code{eval_gettext} function
455* eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
456* eval_pgettext Invocation::    Invoking the @code{eval_pgettext} function
457* eval_npgettext Invocation::   Invoking the @code{eval_npgettext} function
458
459Perl
460
461* General Problems::            General Problems Parsing Perl Code
462* Default Keywords::            Which Keywords Will xgettext Look For?
463* Special Keywords::            How to Extract Hash Keys
464* Quote-like Expressions::      What are Strings And Quote-like Expressions?
465* Interpolation I::             Invalid String Interpolation
466* Interpolation II::            Valid String Interpolation
467* Parentheses::                 When To Use Parentheses
468* Long Lines::                  How To Grok with Long Lines
469* Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
470
471Other Data Formats
472
473* Internationalizable Data::    Internationalizable Data Formats
474* Localized Data::              Localized Data Formats
475
476Internationalizable Data Formats
477
478* POT::                         POT - Portable Object Template
479* RST::                         Resource String Table
480* Glade::                       Glade - GNOME user interface description
481* GSettings::                   GSettings - GNOME user configuration schema
482* AppData::                     AppData - freedesktop.org application description
483* Preparing ITS Rules::         Preparing Rules for XML Internationalization
484
485Localized Data Formats
486
487* Editable Message Catalogs::   Editable Message Catalogs
488* Compiled Message Catalogs::   Compiled Message Catalogs
489* Desktop Entry::               Desktop Entry files
490* XML::                         XML files
491
492Editable Message Catalogs
493
494* PO::                          PO - Portable Object
495* Java .properties::            Java .properties
496* GNUstep .strings::            NeXTstep/GNUstep .strings
497
498Compiled Message Catalogs
499
500* MO::                          MO - Machine Object
501* Java ResourceBundle::         Java ResourceBundle
502* C# Satellite Assembly::       C# Satellite Assembly
503* C# Resource::                 C# Resource
504* Tcl message catalog::         Tcl message catalog
505* Qt message catalog::          Qt message catalog
506
507Concluding Remarks
508
509* History::                     History of GNU @code{gettext}
510* The original ABOUT-NLS::      Historical introduction
511* References::                  Related Readings
512
513Language Codes
514
515* Usual Language Codes::        Two-letter ISO 639 language codes
516* Rare Language Codes::         Three-letter ISO 639 language codes
517
518Licenses
519
520* GNU GPL::                     GNU General Public License
521* GNU LGPL::                    GNU Lesser General Public License
522* GNU FDL::                     GNU Free Documentation License
523
524@end detailmenu
525@end menu
526
527@end ifnottex
528
529@node Introduction
530@chapter Introduction
531
532This chapter explains the goals sought in the creation
533of GNU @code{gettext} and the free Translation Project.
534Then, it explains a few broad concepts around
535Native Language Support, and positions message translation with regard
536to other aspects of national and cultural variance, as they apply
537to programs.  It also surveys those files used to convey the
538translations.  It explains how the various tools interact in the
539initial generation of these files, and later, how the maintenance
540cycle should usually operate.
541
542@cindex sex
543@cindex he, she, and they
544@cindex she, he, and they
545In this manual, we use @emph{he} when speaking of the programmer or
546maintainer, @emph{she} when speaking of the translator, and @emph{they}
547when speaking of the installers or end users of the translated program.
548This is only a convenience for clarifying the documentation.  It is
549@emph{absolutely} not meant to imply that some roles are more appropriate
550to males or females.  Besides, as you might guess, GNU @code{gettext}
551is meant to be useful for people using computers, whatever their sex,
552race, religion or nationality!
553
554@cindex bug report address
555Please submit suggestions and corrections
556@itemize @bullet
557@item
558either in the bug tracker at @url{https://savannah.gnu.org/projects/gettext}
559@item
560or by email to @code{bug-gettext@@gnu.org}.
561@end itemize
562
563@noindent
564Please include the manual's edition number and update date in your messages.
565
566@menu
567* Why::                         The Purpose of GNU @code{gettext}
568* Concepts::                    I18n, L10n, and Such
569* Aspects::                     Aspects in Native Language Support
570* Files::                       Files Conveying Translations
571* Overview::                    Overview of GNU @code{gettext}
572@end menu
573
574@node Why
575@section The Purpose of GNU @code{gettext}
576
577Usually, programs are written and documented in English, and use
578English at execution time to interact with users.  This is true
579not only of GNU software, but also of a great deal of proprietary
580and free software.  Using a common language is quite handy for
581communication between developers, maintainers and users from all
582countries.  On the other hand, most people are less comfortable with
583English than with their own native language, and would prefer to
584use their mother tongue for day to day's work, as far as possible.
585Many would simply @emph{love} to see their computer screen showing
586a lot less of English, and far more of their own language.
587
588@cindex Translation Project
589However, to many people, this dream might appear so far fetched that
590they may believe it is not even worth spending time thinking about
591it.  They have no confidence at all that the dream might ever
592become true.  Yet some have not lost hope, and have organized themselves.
593The Translation Project is a formalization of this hope into a
594workable structure, which has a good chance to get all of us nearer
595the achievement of a truly multi-lingual set of programs.
596
597GNU @code{gettext} is an important step for the Translation Project,
598as it is an asset on which we may build many other steps.  This package
599offers to programmers, translators and even users, a well integrated
600set of tools and documentation.  Specifically, the GNU @code{gettext}
601utilities are a set of tools that provides a framework within which
602other free packages may produce multi-lingual messages.  These tools
603include
604
605@itemize @bullet
606@item
607A set of conventions about how programs should be written to support
608message catalogs.
609
610@item
611A directory and file naming organization for the message catalogs
612themselves.
613
614@item
615A runtime library supporting the retrieval of translated messages.
616
617@item
618A few stand-alone programs to massage in various ways the sets of
619translatable strings, or already translated strings.
620
621@item
622A library supporting the parsing and creation of files containing
623translated messages.
624
625@item
626A special mode for Emacs@footnote{In this manual, all mentions of Emacs
627refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
628Emacs and Lucid Emacs, respectively.} which helps preparing these sets
629and bringing them up to date.
630@end itemize
631
632GNU @code{gettext} is designed to minimize the impact of
633internationalization on program sources, keeping this impact as small
634and hardly noticeable as possible.  Internationalization has better
635chances of succeeding if it is very light weighted, or at least,
636appear to be so, when looking at program sources.
637
638The Translation Project also uses the GNU @code{gettext} distribution
639as a vehicle for documenting its structure and methods.  This goes
640beyond the strict technicalities of documenting the GNU @code{gettext}
641proper.  By so doing, translators will find in a single place, as
642far as possible, all they need to know for properly doing their
643translating work.  Also, this supplemental documentation might also
644help programmers, and even curious users, in understanding how GNU
645@code{gettext} is related to the remainder of the Translation
646Project, and consequently, have a glimpse at the @emph{big picture}.
647
648@node Concepts
649@section I18n, L10n, and Such
650
651@cindex i18n
652@cindex l10n
653Two long words appear all the time when we discuss support of native
654language in programs, and these words have a precise meaning, worth
655being explained here, once and for all in this document.  The words are
656@emph{internationalization} and @emph{localization}.  Many people,
657tired of writing these long words over and over again, took the
658habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
659and last letter of each word, and replacing the run of intermediate
660letters by a number merely telling how many such letters there are.
661But in this manual, in the sake of clarity, we will patiently write
662the names in full, each time@dots{}
663
664@cindex internationalization
665By @dfn{internationalization}, one refers to the operation by which a
666program, or a set of programs turned into a package, is made aware of and
667able to support multiple languages.  This is a generalization process,
668by which the programs are untied from calling only English strings or
669other English specific habits, and connected to generic ways of doing
670the same, instead.  Program developers may use various techniques to
671internationalize their programs.  Some of these have been standardized.
672GNU @code{gettext} offers one of these standards.  @xref{Programmers}.
673
674@cindex localization
675By @dfn{localization}, one means the operation by which, in a set
676of programs already internationalized, one gives the program all
677needed information so that it can adapt itself to handle its input
678and output in a fashion which is correct for some native language and
679cultural habits.  This is a particularisation process, by which generic
680methods already implemented in an internationalized program are used
681in specific ways.  The programming environment puts several functions
682to the programmers disposal which allow this runtime configuration.
683The formal description of specific set of cultural habits for some
684country, together with all associated translations targeted to the
685same native language, is called the @dfn{locale} for this language
686or country.  Users achieve localization of programs by setting proper
687values to special environment variables, prior to executing those
688programs, identifying which locale should be used.
689
690In fact, locale message support is only one component of the cultural
691data that makes up a particular locale.  There are a whole host of
692routines and functions provided to aid programmers in developing
693internationalized software and which allow them to access the data
694stored in a particular locale.  When someone presently refers to a
695particular locale, they are obviously referring to the data stored
696within that particular locale.  Similarly, if a programmer is referring
697to ``accessing the locale routines'', they are referring to the
698complete suite of routines that access all of the locale's information.
699
700@cindex NLS
701@cindex Native Language Support
702@cindex Natural Language Support
703One uses the expression @dfn{Native Language Support}, or merely NLS,
704for speaking of the overall activity or feature encompassing both
705internationalization and localization, allowing for multi-lingual
706interactions in a program.  In a nutshell, one could say that
707internationalization is the operation by which further localizations
708are made possible.
709
710Also, very roughly said, when it comes to multi-lingual messages,
711internationalization is usually taken care of by programmers, and
712localization is usually taken care of by translators.
713
714@node Aspects
715@section Aspects in Native Language Support
716
717@cindex translation aspects
718For a totally multi-lingual distribution, there are many things to
719translate beyond output messages.
720
721@itemize @bullet
722@item
723As of today, GNU @code{gettext} offers a complete toolset for
724translating messages output by C programs.  Perl scripts and shell
725scripts will also need to be translated.  Even if there are today some hooks
726by which this can be done, these hooks are not integrated as well as they
727should be.
728
729@item
730Some programs, like @code{autoconf} or @code{bison}, are able
731to produce other programs (or scripts).  Even if the generating
732programs themselves are internationalized, the generated programs they
733produce may need internationalization on their own, and this indirect
734internationalization could be automated right from the generating
735program.  In fact, quite usually, generating and generated programs
736could be internationalized independently, as the effort needed is
737fairly orthogonal.
738
739@item
740A few programs include textual tables which might need translation
741themselves, independently of the strings contained in the program
742itself.  For example, @w{RFC 1345} gives an English description for each
743character which the @code{recode} program is able to reconstruct at execution.
744Since these descriptions are extracted from the RFC by mechanical means,
745translating them properly would require a prior translation of the RFC
746itself.
747
748@item
749Almost all programs accept options, which are often worded out so to
750be descriptive for the English readers; one might want to consider
751offering translated versions for program options as well.
752
753@item
754Many programs read, interpret, compile, or are somewhat driven by
755input files which are texts containing keywords, identifiers, or
756replies which are inherently translatable.  For example, one may want
757@code{gcc} to allow diacriticized characters in identifiers or use
758translated keywords; @samp{rm -i} might accept something else than
759@samp{y} or @samp{n} for replies, etc.  Even if the program will
760eventually make most of its output in the foreign languages, one has
761to decide whether the input syntax, option values, etc., are to be
762localized or not.
763
764@item
765The manual accompanying a package, as well as all documentation files
766in the distribution, could surely be translated, too.  Translating a
767manual, with the intent of later keeping up with updates, is a major
768undertaking in itself, generally.
769
770@end itemize
771
772As we already stressed, translation is only one aspect of locales.
773Other internationalization aspects are system services and are handled
774in GNU @code{libc}.  There
775are many attributes that are needed to define a country's cultural
776conventions.  These attributes include beside the country's native
777language, the formatting of the date and time, the representation of
778numbers, the symbols for currency, etc.  These local @dfn{rules} are
779termed the country's locale.  The locale represents the knowledge
780needed to support the country's native attributes.
781
782@cindex locale categories
783There are a few major areas which may vary between countries and
784hence, define what a locale must describe.  The following list helps
785putting multi-lingual messages into the proper context of other tasks
786related to locales.  See the GNU @code{libc} manual for details.
787
788@table @emph
789
790@item Characters and Codesets
791@cindex codeset
792@cindex encoding
793@cindex character encoding
794@cindex locale category, LC_CTYPE
795
796The codeset most commonly used through out the USA and most English
797speaking parts of the world is the ASCII codeset.  However, there are
798many characters needed by various locales that are not found within
799this codeset.  The 8-bit @w{ISO 8859-1} code set has most of the special
800characters needed to handle the major European languages.  However, in
801many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
802doesn't even handle the major European currency.  Hence each locale
803will need to specify which codeset they need to use and will need
804to have the appropriate character handling routines to cope with
805the codeset.
806
807@item Currency
808@cindex currency symbols
809@cindex locale category, LC_MONETARY
810
811The symbols used vary from country to country as does the position
812used by the symbol.  Software needs to be able to transparently
813display currency figures in the native mode for each locale.
814
815@item Dates
816@cindex date format
817@cindex locale category, LC_TIME
818
819The format of date varies between locales.  For example, Christmas day
820in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
821Other countries might use @w{ISO 8601} dates, etc.
822
823Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
824or otherwise.  Some locales require time to be specified in 24-hour
825mode rather than as AM or PM.  Further, the nature and yearly extent
826of the Daylight Saving correction vary widely between countries.
827
828@item Numbers
829@cindex number format
830@cindex locale category, LC_NUMERIC
831
832Numbers can be represented differently in different locales.
833For example, the following numbers are all written correctly for
834their respective locales:
835
836@example
83712,345.67       English
83812.345,67       German
839 12345,67       French
8401,2345.67       Asia
841@end example
842
843Some programs could go further and use different unit systems, like
844English units or Metric units, or even take into account variants
845about how numbers are spelled in full.
846
847@item Messages
848@cindex messages
849@cindex locale category, LC_MESSAGES
850
851The most obvious area is the language support within a locale.  This is
852where GNU @code{gettext} provides the means for developers and users to
853easily change the language that the software uses to communicate to
854the user.
855
856@end table
857
858@cindex locale categories
859These areas of cultural conventions are called @emph{locale categories}.
860It is an unfortunate term; @emph{locale aspects} or @emph{locale feature
861categories} would be a better term, because each ``locale category''
862describes an area or task that requires localization.  The concrete data
863that describes the cultural conventions for such an area and for a particular
864culture is also called a @emph{locale category}.  In this sense, a locale
865is composed of several locale categories: the locale category describing
866the codeset, the locale category describing the formatting of numbers,
867the locale category containing the translated messages, and so on.
868
869@cindex Linux
870Components of locale outside of message handling are standardized in
871the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
872specification).  GNU @code{libc}
873fully implements this, and most other modern systems provide a more
874or less reasonable support for at least some of the missing components.
875
876@node Files
877@section Files Conveying Translations
878
879@cindex files, @file{.po} and @file{.mo}
880The letters PO in @file{.po} files means Portable Object, to
881distinguish it from @file{.mo} files, where MO stands for Machine
882Object.  This paradigm, as well as the PO file format, is inspired
883by the NLS standard developed by Uniforum, and first implemented by
884Sun in their Solaris system.
885
886PO files are meant to be read and edited by humans, and associate each
887original, translatable string of a given package with its translation
888in a particular target language.  A single PO file is dedicated to
889a single target language.  If a package supports many languages,
890there is one such PO file per language supported, and each package
891has its own set of PO files.  These PO files are best created by
892the @code{xgettext} program, and later updated or refreshed through
893the @code{msgmerge} program.  Program @code{xgettext} extracts all
894marked messages from a set of C files and initializes a PO file with
895empty translations.  Program @code{msgmerge} takes care of adjusting
896PO files between releases of the corresponding sources, commenting
897obsolete entries, initializing new ones, and updating all source
898line references.  Files ending with @file{.pot} are kind of base
899translation files found in distributions, in PO file format.
900
901MO files are meant to be read by programs, and are binary in nature.
902A few systems already offer tools for creating and handling MO files
903as part of the Native Language Support coming with the system, but the
904format of these MO files is often different from system to system,
905and non-portable.  The tools already provided with these systems don't
906support all the features of GNU @code{gettext}.  Therefore GNU
907@code{gettext} uses its own format for MO files.  Files ending with
908@file{.gmo} are really MO files, when it is known that these files use
909the GNU format.
910
911@node Overview
912@section Overview of GNU @code{gettext}
913
914@cindex overview of @code{gettext}
915@cindex big picture
916@cindex tutorial of @code{gettext} usage
917The following diagram summarizes the relation between the files
918handled by GNU @code{gettext} and the tools acting on these files.
919It is followed by somewhat detailed explanations, which you should
920read while keeping an eye on the diagram.  Having a clear understanding
921of these interrelations will surely help programmers, translators
922and maintainers.
923
924@ifhtml
925@example
926@group
927Original C Sources ───> Preparation ───> Marked C Sources ───╮
928929              ╭─────────<─── GNU gettext Library             │
930╭─── make <───┤                                              │
931│             ╰─────────<────────────────────┬───────────────╯
932│                                            │
933│   ╭─────<─── PACKAGE.pot <─── xgettext <───╯   ╭───<─── PO Compendium
934│   │                                            │              ↑
935│   │                                            ╰───╮          │
936│   ╰───╮                                            ├───> PO editor ───╮
937│       ├────> msgmerge ──────> LANG.po ────>────────╯                  │
938│   ╭───╯                                                               │
939│   │                                                                   │
940│   ╰─────────────<───────────────╮                                     │
941│                                 ├─── New LANG.po <────────────────────╯
942│   ╭─── LANG.gmo <─── msgfmt <───╯
943│   │
944│   ╰───> install ───> /.../LANG/PACKAGE.mo ───╮
945│                                              ├───> "Hello world!"
946╰───────> install ───> /.../bin/PROGRAM ───────╯
947@end group
948@end example
949@end ifhtml
950@ifnothtml
951@example
952@group
953Original C Sources ---> Preparation ---> Marked C Sources ---.
954                                                             |
955              .---------<--- GNU gettext Library             |
956.--- make <---+                                              |
957|             `---------<--------------------+---------------'
958|                                            |
959|   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
960|   |                                            |              ^
961|   |                                            `---.          |
962|   `---.                                            +---> PO editor ---.
963|       +----> msgmerge ------> LANG.po ---->--------'                  |
964|   .---'                                                               |
965|   |                                                                   |
966|   `-------------<---------------.                                     |
967|                                 +--- New LANG.po <--------------------'
968|   .--- LANG.gmo <--- msgfmt <---'
969|   |
970|   `---> install ---> /.../LANG/PACKAGE.mo ---.
971|                                              +---> "Hello world!"
972`-------> install ---> /.../bin/PROGRAM -------'
973@end group
974@end example
975@end ifnothtml
976
977@cindex marking translatable strings
978As a programmer, the first step to bringing GNU @code{gettext}
979into your package is identifying, right in the C sources, those strings
980which are meant to be translatable, and those which are untranslatable.
981This tedious job can be done a little more comfortably using emacs PO
982mode, but you can use any means familiar to you for modifying your
983C sources.  Beside this some other simple, standard changes are needed to
984properly initialize the translation library.  @xref{Sources}, for
985more information about all this.
986
987For newly written software the strings of course can and should be
988marked while writing it.  The @code{gettext} approach makes this
989very easy.  Simply put the following lines at the beginning of each file
990or in a central header file:
991
992@example
993@group
994#define _(String) (String)
995#define N_(String) String
996#define textdomain(Domain)
997#define bindtextdomain(Package, Directory)
998@end group
999@end example
1000
1001@noindent
1002Doing this allows you to prepare the sources for internationalization.
1003Later when you feel ready for the step to use the @code{gettext} library
1004simply replace these definitions by the following:
1005
1006@cindex include file @file{libintl.h}
1007@example
1008@group
1009#include <libintl.h>
1010#define _(String) gettext (String)
1011#define gettext_noop(String) String
1012#define N_(String) gettext_noop (String)
1013@end group
1014@end example
1015
1016@cindex link with @file{libintl}
1017@cindex Linux
1018@noindent
1019and link against @file{libintl.a} or @file{libintl.so}.  Note that on
1020GNU systems, you don't need to link with @code{libintl} because the
1021@code{gettext} library functions are already contained in GNU libc.
1022That is all you have to change.
1023
1024@cindex template PO file
1025@cindex files, @file{.pot}
1026Once the C sources have been modified, the @code{xgettext} program
1027is used to find and extract all translatable strings, and create a
1028PO template file out of all these.  This @file{@var{package}.pot} file
1029contains all original program strings.  It has sets of pointers to
1030exactly where in C sources each string is used.  All translations
1031are set to empty.  The letter @code{t} in @file{.pot} marks this as
1032a Template PO file, not yet oriented towards any particular language.
1033@xref{xgettext Invocation}, for more details about how one calls the
1034@code{xgettext} program.  If you are @emph{really} lazy, you might
1035be interested at working a lot more right away, and preparing the
1036whole distribution setup (@pxref{Maintainers}).  By doing so, you
1037spare yourself typing the @code{xgettext} command, as @code{make}
1038should now generate the proper things automatically for you!
1039
1040The first time through, there is no @file{@var{lang}.po} yet, so the
1041@code{msgmerge} step may be skipped and replaced by a mere copy of
1042@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
1043represents the target language.  See @ref{Creating} for details.
1044
1045Then comes the initial translation of messages.  Translation in
1046itself is a whole matter, still exclusively meant for humans,
1047and whose complexity far overwhelms the level of this manual.
1048Nevertheless, a few hints are given in some other chapter of this
1049manual (@pxref{Translators}).  You will also find there indications
1050about how to contact translating teams, or becoming part of them,
1051for sharing your translating concerns with others who target the same
1052native language.
1053
1054While adding the translated messages into the @file{@var{lang}.po}
1055PO file, if you are not using one of the dedicated PO file editors
1056(@pxref{Editing}), you are on your own
1057for ensuring that your efforts fully respect the PO file format, and quoting
1058conventions (@pxref{PO Files}).  This is surely not an impossible task,
1059as this is the way many people have handled PO files around 1995.
1060On the other hand, by using a PO file editor, most details
1061of PO file format are taken care of for you, but you have to acquire
1062some familiarity with PO file editor itself.
1063
1064If some common translations have already been saved into a compendium
1065PO file, translators may use PO mode for initializing untranslated
1066entries from the compendium, and also save selected translations into
1067the compendium, updating it (@pxref{Compendium}).  Compendium files
1068are meant to be exchanged between members of a given translation team.
1069
1070Programs, or packages of programs, are dynamic in nature: users write
1071bug reports and suggestion for improvements, maintainers react by
1072modifying programs in various ways.  The fact that a package has
1073already been internationalized should not make maintainers shy
1074of adding new strings, or modifying strings already translated.
1075They just do their job the best they can.  For the Translation
1076Project to work smoothly, it is important that maintainers do not
1077carry translation concerns on their already loaded shoulders, and that
1078translators be kept as free as possible of programming concerns.
1079
1080The only concern maintainers should have is carefully marking new
1081strings as translatable, when they should be, and do not otherwise
1082worry about them being translated, as this will come in proper time.
1083Consequently, when programs and their strings are adjusted in various
1084ways by maintainers, and for matters usually unrelated to translation,
1085@code{xgettext} would construct @file{@var{package}.pot} files which are
1086evolving over time, so the translations carried by @file{@var{lang}.po}
1087are slowly fading out of date.
1088
1089@cindex evolution of packages
1090It is important for translators (and even maintainers) to understand
1091that package translation is a continuous process in the lifetime of a
1092package, and not something which is done once and for all at the start.
1093After an initial burst of translation activity for a given package,
1094interventions are needed once in a while, because here and there,
1095translated entries become obsolete, and new untranslated entries
1096appear, needing translation.
1097
1098The @code{msgmerge} program has the purpose of refreshing an already
1099existing @file{@var{lang}.po} file, by comparing it with a newer
1100@file{@var{package}.pot} template file, extracted by @code{xgettext}
1101out of recent C sources.  The refreshing operation adjusts all
1102references to C source locations for strings, since these strings
1103move as programs are modified.  Also, @code{msgmerge} comments out as
1104obsolete, in @file{@var{lang}.po}, those already translated entries
1105which are no longer used in the program sources (@pxref{Obsolete
1106Entries}).  It finally discovers new strings and inserts them in
1107the resulting PO file as untranslated entries (@pxref{Untranslated
1108Entries}).  @xref{msgmerge Invocation}, for more information about what
1109@code{msgmerge} really does.
1110
1111Whatever route or means taken, the goal is to obtain an updated
1112@file{@var{lang}.po} file offering translations for all strings.
1113
1114The temporal mobility, or fluidity of PO files, is an integral part of
1115the translation game, and should be well understood, and accepted.
1116People resisting it will have a hard time participating in the
1117Translation Project, or will give a hard time to other participants!  In
1118particular, maintainers should relax and include all available official
1119PO files in their distributions, even if these have not recently been
1120updated, without exerting pressure on the translator teams to get the
1121job done.  The pressure should rather come
1122from the community of users speaking a particular language, and
1123maintainers should consider themselves fairly relieved of any concern
1124about the adequacy of translation files.  On the other hand, translators
1125should reasonably try updating the PO files they are responsible for,
1126while the package is undergoing pretest, prior to an official
1127distribution.
1128
1129Once the PO file is complete and dependable, the @code{msgfmt} program
1130is used for turning the PO file into a machine-oriented format, which
1131may yield efficient retrieval of translations by the programs of the
1132package, whenever needed at runtime (@pxref{MO Files}).  @xref{msgfmt
1133Invocation}, for more information about all modes of execution
1134for the @code{msgfmt} program.
1135
1136Finally, the modified and marked C sources are compiled and linked
1137with the GNU @code{gettext} library, usually through the operation of
1138@code{make}, given a suitable @file{Makefile} exists for the project,
1139and the resulting executable is installed somewhere users will find it.
1140The MO files themselves should also be properly installed.  Given the
1141appropriate environment variables are set (@pxref{Setting the POSIX Locale}),
1142the program should localize itself automatically, whenever it executes.
1143
1144The remainder of this manual has the purpose of explaining in depth the various
1145steps outlined above.
1146
1147@node Users
1148@chapter The User's View
1149
1150Nowadays, when users log into a computer, they usually find that all
1151their programs show messages in their native language -- at least for
1152users of languages with an active free software community, like French or
1153German; to a lesser extent for languages with a smaller participation in
1154free software and the GNU project, like Hindi and Filipino.
1155
1156How does this work?  How can the user influence the language that is used
1157by the programs?  This chapter will answer it.
1158
1159@menu
1160* System Installation::         Questions During Operating System Installation
1161* Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
1162* Setting the POSIX Locale::    How to Specify the Locale According to POSIX
1163* Working in a Windows console::  Obtaining good output in a Windows console
1164* Installing Localizations::    How to Install Additional Translations
1165@end menu
1166
1167@node System Installation
1168@section Operating System Installation
1169
1170The default language is often already specified during operating system
1171installation.  When the operating system is installed, the installer
1172typically asks for the language used for the installation process and,
1173separately, for the language to use in the installed system.  Some OS
1174installers only ask for the language once.
1175
1176This determines the system-wide default language for all users.  But the
1177installers often give the possibility to install extra localizations for
1178additional languages.  For example, the localizations of KDE (the K
1179Desktop Environment) and OpenOffice.org are often bundled separately,
1180as one installable package per language.
1181
1182At this point it is good to consider the intended use of the machine: If
1183it is a machine designated for personal use, additional localizations are
1184probably not necessary.  If, however, the machine is in use in an
1185organization or company that has international relationships, one can
1186consider the needs of guest users.  If you have a guest from abroad, for
1187a week, what could be his preferred locales?  It may be worth installing
1188these additional localizations ahead of time, since they cost only a bit
1189of disk space at this point.
1190
1191The system-wide default language is the locale configuration that is used
1192when a new user account is created.  But the user can have his own locale
1193configuration that is different from the one of the other users of the
1194same machine.  He can specify it, typically after the first login, as
1195described in the next section.
1196
1197@node Setting the GUI Locale
1198@section Setting the Locale Used by GUI Programs
1199
1200The immediately available programs in a user's desktop come from a group
1201of programs called a ``desktop environment''; it usually includes the window
1202manager, a web browser, a text editor, and more.  The most common free
1203desktop environments are KDE, GNOME, and Xfce.
1204
1205The locale used by GUI programs of the desktop environment can be specified
1206in a configuration screen called ``control center'', ``language settings''
1207or ``country settings''.
1208
1209Individual GUI programs that are not part of the desktop environment can
1210have their locale specified either in a settings panel, or through environment
1211variables.
1212
1213For some programs, it is possible to specify the locale through environment
1214variables, possibly even to a different locale than the desktop's locale.
1215This means, instead of starting a program through a menu or from the file
1216system, you can start it from the command-line, after having set some
1217environment variables.  The environment variables can be those specified
1218in the next section (@ref{Setting the POSIX Locale}); for some versions of
1219KDE, however, the locale is specified through a variable @code{KDE_LANG},
1220rather than @code{LANG} or @code{LC_ALL}.
1221
1222@node Setting the POSIX Locale
1223@section Setting the Locale through Environment Variables
1224
1225As a user, if your language has been installed for this package, in the
1226simplest case, you only have to set the @code{LANG} environment variable
1227to the appropriate @samp{@var{ll}_@var{CC}} combination.  For example,
1228let's suppose that you speak German and live in Germany.  At the shell
1229prompt, merely execute
1230@w{@samp{setenv LANG de_DE}} (in @code{csh}),
1231@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or
1232@w{@samp{export LANG=de_DE}} (in @code{bash}).  This can be done from your
1233@file{.login} or @file{.profile} file, once and for all.
1234
1235@menu
1236* Locale Names::                How a Locale Specification Looks Like
1237* Locale Environment Variables::  Which Environment Variable Specfies What
1238* The LANGUAGE variable::       How to Specify a Priority List of Languages
1239@end menu
1240
1241@node Locale Names
1242@subsection Locale Names
1243
1244A locale name usually has the form @samp{@var{ll}_@var{CC}}.  Here
1245@samp{@var{ll}} is an @w{ISO 639} two-letter language code, and
1246@samp{@var{CC}} is an @w{ISO 3166} two-letter country code.  For example,
1247for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}.
1248You find a list of the language codes in appendix @ref{Language Codes} and
1249a list of the country codes in appendix @ref{Country Codes}.
1250
1251You might think that the country code specification is redundant.  But in
1252fact, some languages have dialects in different countries.  For example,
1253@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil.  The country
1254code serves to distinguish the dialects.
1255
1256Many locale names have an extended syntax
1257@samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character
1258encoding.  These are in use because between 2000 and 2005, most users have
1259switched to locales in UTF-8 encoding.  For example, the German locale on
1260glibc systems is nowadays @samp{de_DE.UTF-8}.  The older name @samp{de_DE}
1261still refers to the German locale as of 2000 that stores characters in
1262ISO-8859-1 encoding -- a text encoding that cannot even accommodate the Euro
1263currency sign.
1264
1265Some locale names use @samp{@var{ll}_@var{CC}@@@var{variant}} instead of
1266@samp{@var{ll}_@var{CC}}.  The @samp{@@@var{variant}} can denote any kind of
1267characteristics that is not already implied by the language @var{ll} and
1268the country @var{CC}.  It can denote a particular monetary unit.  For example,
1269on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro
1270currency, in contrast to the older locale @samp{de_DE} which implies the use
1271of the currency before 2002.  It can also denote a dialect of the language,
1272or the script used to write text (for example, @samp{sr_RS@@latin} uses the
1273Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian),
1274or the orthography rules, or similar.
1275
1276On other systems, some variations of this scheme are used, such as
1277@samp{@var{ll}}.  You can get the list of locales supported by your system
1278for your language by running the command @samp{locale -a | grep '^@var{ll}'}.
1279
1280There is also a special locale, called @samp{C}.
1281@c Don't mention that this locale also has the name "POSIX". When we talk about
1282@c the "POSIX locale", we mean the "locale as specified in the POSIX way", and
1283@c mentioning a locale called "POSIX" would bring total confusion.
1284When it is used, it disables all localization: in this locale, all programs
1285standardized by POSIX use English messages and an unspecified character
1286encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on
1287the operating system).
1288
1289@node Locale Environment Variables
1290@subsection Locale Environment Variables
1291@cindex setting up @code{gettext} at run time
1292@cindex selecting message language
1293@cindex language selection
1294
1295A locale is composed of several @emph{locale categories}, see @ref{Aspects}.
1296When a program looks up locale dependent values, it does this according to
1297the following environment variables, in priority order:
1298
1299@enumerate
1300@vindex LANGUAGE@r{, environment variable}
1301@item @code{LANGUAGE}
1302@vindex LC_ALL@r{, environment variable}
1303@item @code{LC_ALL}
1304@vindex LC_CTYPE@r{, environment variable}
1305@vindex LC_NUMERIC@r{, environment variable}
1306@vindex LC_TIME@r{, environment variable}
1307@vindex LC_COLLATE@r{, environment variable}
1308@vindex LC_MONETARY@r{, environment variable}
1309@vindex LC_MESSAGES@r{, environment variable}
1310@item @code{LC_xxx}, according to selected locale category:
1311@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1312@code{LC_MONETARY}, @code{LC_MESSAGES}, ...
1313@vindex LANG@r{, environment variable}
1314@item @code{LANG}
1315@end enumerate
1316
1317Variables whose value is set but is empty are ignored in this lookup.
1318
1319@code{LANG} is the normal environment variable for specifying a locale.
1320As a user, you normally set this variable (unless some of the other variables
1321have already been set by the system, in @file{/etc/profile} or similar
1322initialization files).
1323
1324@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1325@code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment
1326variables meant to override @code{LANG} and affecting a single locale
1327category only.  For example, assume you are a Swedish user in Spain, and you
1328want your programs to handle numbers and dates according to Spanish
1329conventions, and only the messages should be in Swedish.  Then you could
1330create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the
1331@code{localedef} program.  But it is simpler, and achieves the same effect,
1332to set the @code{LANG} variable to @code{es_ES.UTF-8} and the
1333@code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come
1334already preinstalled with the operating system.
1335
1336@code{LC_ALL} is an environment variable that overrides all of these.
1337It is typically used in scripts that run particular programs.  For example,
1338@code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make
1339sure that the configuration tests don't operate in locale dependent ways.
1340
1341Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in
1342similar initialization files.  As a user, you therefore have to unset this
1343variable if you want to set @code{LANG} and optionally some of the other
1344@code{LC_xxx} variables.
1345
1346The @code{LANGUAGE} variable is described in the next subsection.
1347
1348@node The LANGUAGE variable
1349@subsection Specifying a Priority List of Languages
1350
1351Not all programs have translations for all languages.  By default, an
1352English message is shown in place of a nonexistent translation.  If you
1353understand other languages, you can set up a priority list of languages.
1354This is done through a different environment variable, called
1355@code{LANGUAGE}.  GNU @code{gettext} gives preference to @code{LANGUAGE}
1356over @code{LC_ALL} and @code{LANG} for the purpose of message handling,
1357but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary
1358language; this is required by other parts of the system libraries.
1359For example, some Swedish users who would rather read translations in
1360German than English for when Swedish is not available, set @code{LANGUAGE}
1361to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}.
1362
1363Special advice for Norwegian users: The language code for Norwegian
1364bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003).
1365During the transition period, while some message catalogs for this language
1366are installed under @samp{nb} and some older ones under @samp{no}, it is
1367recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that
1368both newer and older translations are used.
1369
1370In the @code{LANGUAGE} environment variable, but not in the other
1371environment variables, @samp{@var{ll}_@var{CC}} combinations can be
1372abbreviated as @samp{@var{ll}} to denote the language's main dialect.
1373For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in
1374Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal)
1375in this context.
1376
1377Note: The variable @code{LANGUAGE} is ignored if the locale is set to
1378@samp{C}.  In other words, you have to first enable localization, by setting
1379@code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can
1380use a language priority list through the @code{LANGUAGE} variable.
1381
1382@node Working in a Windows console
1383@section Obtaining good output in a Windows console
1384@cindex Windows
1385@cindex ANSI encoding
1386@cindex OEM encoding
1387@vindex OUTPUT_CHARSET@r{, environment variable}
1388
1389On Windows, consoles such as the one started by the @code{cmd.exe}
1390program do input and output in an encoding, called ``OEM code page'',
1391that is different from the encoding that text-mode programs usually use,
1392called ``ANSI code page''.  (Note: This problem does not exist for
1393Cygwin consoles; these consoles do input and output in the UTF-8
1394encoding.)  As a workaround, you may request that the programs produce
1395output in this ``OEM'' encoding.  To do so, set the environment variable
1396@code{OUTPUT_CHARSET} to the ``OEM'' encoding, through a command such as
1397@smallexample
1398set OUTPUT_CHARSET=CP850
1399@end smallexample
1400Note: This has an effect only on strings looked up in message catalogs;
1401other categories of text are usually not affected by this setting.
1402Note also that this environment variable also affects output sent to a
1403file or to a pipe; output to a file is most often expected to be in the
1404``ANSI'' or in the UTF-8 encoding.
1405
1406Here are examples of the ``ANSI'' and ``OEM'' code pages:
1407
1408@multitable @columnfractions .5 .25 .25
1409@headitem Territories @tie{} @tab @tie{} ANSI encoding @tie{} @tab @tie{} OEM encoding
1410@item Western Europe @tie{} @tab @tie{} CP1252 @tie{} @tab @tie{} CP850
1411@item Slavic countries (Latin 2) @tie{} @tab @tie{} CP1250 @tie{} @tab @tie{} CP852
1412@item Baltic countries @tie{} @tab @tie{} CP1257 @tie{} @tab @tie{} CP775
1413@item Russia @tie{} @tab @tie{} CP1251 @tie{} @tab @tie{} CP866
1414@end multitable
1415
1416@node Installing Localizations
1417@section Installing Translations for Particular Programs
1418@cindex Translation Matrix
1419@cindex available translations
1420
1421Languages are not equally well supported in all packages using GNU
1422@code{gettext}, and more translations are added over time.  Usually, you
1423use the translations that are shipped with the operating system
1424or with particular packages that you install afterwards.  But you can also
1425install newer localizations directly.  For doing this, you will need an
1426understanding where each localization file is stored on the file system.
1427
1428@cindex @file{ABOUT-NLS} file
1429For programs that participate in the Translation Project, you can start
1430looking for translations here:
1431@url{https://translationproject.org/team/index.html}.
1432
1433For programs that are part of the KDE project, the starting point is:
1434@url{https://l10n.kde.org/}.
1435
1436For programs that are part of the GNOME project, the starting point is:
1437@url{https://wiki.gnome.org/TranslationProject}.
1438
1439For other programs, you may check whether the program's source code package
1440contains some @file{@var{ll}.po} files; often they are kept together in a
1441directory called @file{po/}.  Each @file{@var{ll}.po} file contains the
1442message translations for the language whose abbreviation of @var{ll}.
1443
1444@node PO Files
1445@chapter The Format of PO Files
1446@cindex PO files' format
1447@cindex file format, @file{.po}
1448
1449The GNU @code{gettext} toolset helps programmers and translators
1450at producing, updating and using translation files, mainly those
1451PO files which are textual, editable files.  This chapter explains
1452the format of PO files.
1453
1454A PO file is made up of many entries, each entry holding the relation
1455between an original untranslated string and its corresponding
1456translation.  All entries in a given PO file usually pertain
1457to a single project, and all translations are expressed in a single
1458target language.  One PO file @dfn{entry} has the following schematic
1459structure:
1460
1461@example
1462@var{white-space}
1463#  @var{translator-comments}
1464#. @var{extracted-comments}
1465#: @var{reference}@dots{}
1466#, @var{flag}@dots{}
1467#| msgid @var{previous-untranslated-string}
1468msgid @var{untranslated-string}
1469msgstr @var{translated-string}
1470@end example
1471
1472The general structure of a PO file should be well understood by
1473the translator.  When using PO mode, very little has to be known
1474about the format details, as PO mode takes care of them for her.
1475
1476A simple entry can look like this:
1477
1478@example
1479#: lib/error.c:116
1480msgid "Unknown system error"
1481msgstr "Error desconegut del sistema"
1482@end example
1483
1484@cindex comments, translator
1485@cindex comments, automatic
1486@cindex comments, extracted
1487Entries begin with some optional white space.  Usually, when generated
1488through GNU @code{gettext} tools, there is exactly one blank line
1489between entries.  Then comments follow, on lines all starting with the
1490character @code{#}.  There are two kinds of comments: those which have
1491some white space immediately following the @code{#} - the @var{translator
1492comments} -, which comments are created and maintained exclusively by the
1493translator, and those which have some non-white character just after the
1494@code{#} - the @var{automatic comments} -, which comments are created and
1495maintained automatically by GNU @code{gettext} tools.  Comment lines
1496starting with @code{#.} contain comments given by the programmer, directed
1497at the translator; these comments are called @var{extracted comments}
1498because the @code{xgettext} program extracts them from the program's
1499source code.  Comment lines starting with @code{#:} contain references to
1500the program's source code.  Comment lines starting with @code{#,} contain
1501flags; more about these below.  Comment lines starting with @code{#|}
1502contain the previous untranslated string for which the translator gave
1503a translation.
1504
1505All comments, of either kind, are optional.
1506
1507@kwindex msgid
1508@kwindex msgstr
1509After white space and comments, entries show two strings, namely
1510first the untranslated string as it appears in the original program
1511sources, and then, the translation of this string.  The original
1512string is introduced by the keyword @code{msgid}, and the translation,
1513by @code{msgstr}.  The two strings, untranslated and translated,
1514are quoted in various ways in the PO file, using @code{"}
1515delimiters and @code{\} escapes, but the translator does not really
1516have to pay attention to the precise quoting format, as PO mode fully
1517takes care of quoting for her.
1518
1519The @code{msgid} strings, as well as automatic comments, are produced
1520and managed by other GNU @code{gettext} tools, and PO mode does not
1521provide means for the translator to alter these.  The most she can
1522do is merely deleting them, and only by deleting the whole entry.
1523On the other hand, the @code{msgstr} string, as well as translator
1524comments, are really meant for the translator, and PO mode gives her
1525the full control she needs.
1526
1527The comment lines beginning with @code{#,} are special because they are
1528not completely ignored by the programs as comments generally are.  The
1529comma separated list of @var{flag}s is used by the @code{msgfmt}
1530program to give the user some better diagnostic messages.  Currently
1531there are two forms of flags defined:
1532
1533@table @code
1534@item fuzzy
1535@kwindex fuzzy@r{ flag}
1536This flag can be generated by the @code{msgmerge} program or it can be
1537inserted by the translator herself.  It shows that the @code{msgstr}
1538string might not be a correct translation (anymore).  Only the translator
1539can judge if the translation requires further modification, or is
1540acceptable as is.  Once satisfied with the translation, she then removes
1541this @code{fuzzy} attribute.  The @code{msgmerge} program inserts this
1542when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1543search only.  @xref{Fuzzy Entries}.
1544
1545@item c-format
1546@kwindex c-format@r{ flag}
1547@itemx no-c-format
1548@kwindex no-c-format@r{ flag}
1549These flags should not be added by a human.  Instead only the
1550@code{xgettext} program adds them.  In an automated PO file processing
1551system as proposed here, the user's changes would be thrown away again as
1552soon as the @code{xgettext} program generates a new template file.
1553
1554The @code{c-format} flag indicates that the untranslated string and the
1555translation are supposed to be C format strings.  The @code{no-c-format}
1556flag indicates that they are not C format strings, even though the untranslated
1557string happens to look like a C format string (with @samp{%} directives).
1558
1559When the @code{c-format} flag is given for a string the @code{msgfmt}
1560program does some more tests to check the validity of the translation.
1561@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1562
1563@item objc-format
1564@kwindex objc-format@r{ flag}
1565@itemx no-objc-format
1566@kwindex no-objc-format@r{ flag}
1567Likewise for Objective C, see @ref{objc-format}.
1568
1569@item python-format
1570@kwindex python-format@r{ flag}
1571@itemx no-python-format
1572@kwindex no-python-format@r{ flag}
1573Likewise for Python, see @ref{python-format}.
1574
1575@item python-brace-format
1576@kwindex python-brace-format@r{ flag}
1577@itemx no-python-brace-format
1578@kwindex no-python-brace-format@r{ flag}
1579Likewise for Python brace, see @ref{python-format}.
1580
1581@item java-format
1582@kwindex java-format@r{ flag}
1583@itemx no-java-format
1584@kwindex no-java-format@r{ flag}
1585Likewise for Java @code{MessageFormat} format strings, see @ref{java-format}.
1586
1587@item java-printf-format
1588@kwindex java-printf-format@r{ flag}
1589@itemx no-java-printf-format
1590@kwindex no-java-printf-format@r{ flag}
1591Likewise for Java @code{printf} format strings, see @ref{java-format}.
1592
1593@item csharp-format
1594@kwindex csharp-format@r{ flag}
1595@itemx no-csharp-format
1596@kwindex no-csharp-format@r{ flag}
1597Likewise for C#, see @ref{csharp-format}.
1598
1599@item javascript-format
1600@kwindex javascript-format@r{ flag}
1601@itemx no-javascript-format
1602@kwindex no-javascript-format@r{ flag}
1603Likewise for JavaScript, see @ref{javascript-format}.
1604
1605@item scheme-format
1606@kwindex scheme-format@r{ flag}
1607@itemx no-scheme-format
1608@kwindex no-scheme-format@r{ flag}
1609Likewise for Scheme, see @ref{scheme-format}.
1610
1611@item lisp-format
1612@kwindex lisp-format@r{ flag}
1613@itemx no-lisp-format
1614@kwindex no-lisp-format@r{ flag}
1615Likewise for Lisp, see @ref{lisp-format}.
1616
1617@item elisp-format
1618@kwindex elisp-format@r{ flag}
1619@itemx no-elisp-format
1620@kwindex no-elisp-format@r{ flag}
1621Likewise for Emacs Lisp, see @ref{elisp-format}.
1622
1623@item librep-format
1624@kwindex librep-format@r{ flag}
1625@itemx no-librep-format
1626@kwindex no-librep-format@r{ flag}
1627Likewise for librep, see @ref{librep-format}.
1628
1629@item ruby-format
1630@kwindex ruby-format@r{ flag}
1631@itemx no-ruby-format
1632@kwindex no-ruby-format@r{ flag}
1633Likewise for Ruby, see @ref{ruby-format}.
1634
1635@item sh-format
1636@kwindex sh-format@r{ flag}
1637@itemx no-sh-format
1638@kwindex no-sh-format@r{ flag}
1639Likewise for Shell, see @ref{sh-format}.
1640
1641@item awk-format
1642@kwindex awk-format@r{ flag}
1643@itemx no-awk-format
1644@kwindex no-awk-format@r{ flag}
1645Likewise for awk, see @ref{awk-format}.
1646
1647@item lua-format
1648@kwindex lua-format@r{ flag}
1649@itemx no-lua-format
1650@kwindex no-lua-format@r{ flag}
1651Likewise for Lua, see @ref{lua-format}.
1652
1653@item object-pascal-format
1654@kwindex object-pascal-format@r{ flag}
1655@itemx no-object-pascal-format
1656@kwindex no-object-pascal-format@r{ flag}
1657Likewise for Object Pascal, see @ref{object-pascal-format}.
1658
1659@item smalltalk-format
1660@kwindex smalltalk-format@r{ flag}
1661@itemx no-smalltalk-format
1662@kwindex no-smalltalk-format@r{ flag}
1663Likewise for Smalltalk, see @ref{smalltalk-format}.
1664
1665@item qt-format
1666@kwindex qt-format@r{ flag}
1667@itemx no-qt-format
1668@kwindex no-qt-format@r{ flag}
1669Likewise for Qt, see @ref{qt-format}.
1670
1671@item qt-plural-format
1672@kwindex qt-plural-format@r{ flag}
1673@itemx no-qt-plural-format
1674@kwindex no-qt-plural-format@r{ flag}
1675Likewise for Qt plural forms, see @ref{qt-plural-format}.
1676
1677@item kde-format
1678@kwindex kde-format@r{ flag}
1679@itemx no-kde-format
1680@kwindex no-kde-format@r{ flag}
1681Likewise for KDE, see @ref{kde-format}.
1682
1683@item boost-format
1684@kwindex boost-format@r{ flag}
1685@itemx no-boost-format
1686@kwindex no-boost-format@r{ flag}
1687Likewise for Boost, see @ref{boost-format}.
1688
1689@item tcl-format
1690@kwindex tcl-format@r{ flag}
1691@itemx no-tcl-format
1692@kwindex no-tcl-format@r{ flag}
1693Likewise for Tcl, see @ref{tcl-format}.
1694
1695@item perl-format
1696@kwindex perl-format@r{ flag}
1697@itemx no-perl-format
1698@kwindex no-perl-format@r{ flag}
1699Likewise for Perl, see @ref{perl-format}.
1700
1701@item perl-brace-format
1702@kwindex perl-brace-format@r{ flag}
1703@itemx no-perl-brace-format
1704@kwindex no-perl-brace-format@r{ flag}
1705Likewise for Perl brace, see @ref{perl-format}.
1706
1707@item php-format
1708@kwindex php-format@r{ flag}
1709@itemx no-php-format
1710@kwindex no-php-format@r{ flag}
1711Likewise for PHP, see @ref{php-format}.
1712
1713@item gcc-internal-format
1714@kwindex gcc-internal-format@r{ flag}
1715@itemx no-gcc-internal-format
1716@kwindex no-gcc-internal-format@r{ flag}
1717Likewise for the GCC sources, see @ref{gcc-internal-format}.
1718
1719@item gfc-internal-format
1720@kwindex gfc-internal-format@r{ flag}
1721@itemx no-gfc-internal-format
1722@kwindex no-gfc-internal-format@r{ flag}
1723Likewise for the GNU Fortran Compiler sources, see @ref{gfc-internal-format}.
1724
1725@item ycp-format
1726@kwindex ycp-format@r{ flag}
1727@itemx no-ycp-format
1728@kwindex no-ycp-format@r{ flag}
1729Likewise for YCP, see @ref{ycp-format}.
1730
1731@end table
1732
1733@kwindex msgctxt
1734@cindex context, in PO files
1735It is also possible to have entries with a context specifier. They look like
1736this:
1737
1738@example
1739@var{white-space}
1740#  @var{translator-comments}
1741#. @var{extracted-comments}
1742#: @var{reference}@dots{}
1743#, @var{flag}@dots{}
1744#| msgctxt @var{previous-context}
1745#| msgid @var{previous-untranslated-string}
1746msgctxt @var{context}
1747msgid @var{untranslated-string}
1748msgstr @var{translated-string}
1749@end example
1750
1751The context serves to disambiguate messages with the same
1752@var{untranslated-string}.  It is possible to have several entries with
1753the same @var{untranslated-string} in a PO file, provided that they each
1754have a different @var{context}.  Note that an empty @var{context} string
1755and an absent @code{msgctxt} line do not mean the same thing.
1756
1757@kwindex msgid_plural
1758@cindex plural forms, in PO files
1759A different kind of entries is used for translations which involve
1760plural forms.
1761
1762@example
1763@var{white-space}
1764#  @var{translator-comments}
1765#. @var{extracted-comments}
1766#: @var{reference}@dots{}
1767#, @var{flag}@dots{}
1768#| msgid @var{previous-untranslated-string-singular}
1769#| msgid_plural @var{previous-untranslated-string-plural}
1770msgid @var{untranslated-string-singular}
1771msgid_plural @var{untranslated-string-plural}
1772msgstr[0] @var{translated-string-case-0}
1773...
1774msgstr[N] @var{translated-string-case-n}
1775@end example
1776
1777Such an entry can look like this:
1778
1779@example
1780#: src/msgcmp.c:338 src/po-lex.c:699
1781#, c-format
1782msgid "found %d fatal error"
1783msgid_plural "found %d fatal errors"
1784msgstr[0] "s'ha trobat %d error fatal"
1785msgstr[1] "s'han trobat %d errors fatals"
1786@end example
1787
1788Here also, a @code{msgctxt} context can be specified before @code{msgid},
1789like above.
1790
1791Here, additional kinds of flags can be used:
1792
1793@table @code
1794@item range:
1795@kwindex range:@r{ flag}
1796This flag is followed by a range of non-negative numbers, using the syntax
1797@code{range: @var{minimum-value}..@var{maximum-value}}.  It designates the
1798possible values that the numeric parameter of the message can take.  In some
1799languages, translators may produce slightly better translations if they know
1800that the value can only take on values between 0 and 10, for example.
1801@end table
1802
1803The @var{previous-untranslated-string} is optionally inserted by the
1804@code{msgmerge} program, at the same time when it marks a message fuzzy.
1805It helps the translator to see which changes were done by the developers
1806on the @var{untranslated-string}.
1807
1808It happens that some lines, usually whitespace or comments, follow the
1809very last entry of a PO file.  Such lines are not part of any entry,
1810and will be dropped when the PO file is processed by the tools, or may
1811disturb some PO file editors.
1812
1813The remainder of this section may be safely skipped by those using
1814a PO file editor, yet it may be interesting for everybody to have a better
1815idea of the precise format of a PO file.  On the other hand, those
1816wishing to modify PO files by hand should carefully continue reading on.
1817
1818An empty @var{untranslated-string} is reserved to contain the header
1819entry with the meta information (@pxref{Header Entry}).  This header
1820entry should be the first entry of the file.  The empty
1821@var{untranslated-string} is reserved for this purpose and must
1822not be used anywhere else.
1823
1824Each of @var{untranslated-string} and @var{translated-string} respects
1825the C syntax for a character string, including the surrounding quotes
1826and embedded backslashed escape sequences.  When the time comes
1827to write multi-line strings, one should not use escaped newlines.
1828Instead, a closing quote should follow the last character on the
1829line to be continued, and an opening quote should resume the string
1830at the beginning of the following PO file line.  For example:
1831
1832@example
1833msgid ""
1834"Here is an example of how one might continue a very long string\n"
1835"for the common case the string represents multi-line output.\n"
1836@end example
1837
1838@noindent
1839In this example, the empty string is used on the first line, to
1840allow better alignment of the @code{H} from the word @samp{Here}
1841over the @code{f} from the word @samp{for}.  In this example, the
1842@code{msgid} keyword is followed by three strings, which are meant
1843to be concatenated.  Concatenating the empty string does not change
1844the resulting overall string, but it is a way for us to comply with
1845the necessity of @code{msgid} to be followed by a string on the same
1846line, while keeping the multi-line presentation left-justified, as
1847we find this to be a cleaner disposition.  The empty string could have
1848been omitted, but only if the string starting with @samp{Here} was
1849promoted on the first line, right after @code{msgid}.@footnote{This
1850limitation is not imposed by GNU @code{gettext}, but is for compatibility
1851with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1852either to switch between the two last quoted strings immediately after
1853the newline @samp{\n}, the switch could have occurred after @emph{any}
1854other character, we just did it this way because it is neater.
1855
1856@cindex newlines in PO files
1857One should carefully distinguish between end of lines marked as
1858@samp{\n} @emph{inside} quotes, which are part of the represented
1859string, and end of lines in the PO file itself, outside string quotes,
1860which have no incidence on the represented string.
1861
1862@cindex comments in PO files
1863Outside strings, white lines and comments may be used freely.
1864Comments start at the beginning of a line with @samp{#} and extend
1865until the end of the PO file line.  Comments written by translators
1866should have the initial @samp{#} immediately followed by some white
1867space.  If the @samp{#} is not immediately followed by white space,
1868this comment is most likely generated and managed by specialized GNU
1869tools, and might disappear or be replaced unexpectedly when the PO
1870file is given to @code{msgmerge}.
1871
1872@node Sources
1873@chapter Preparing Program Sources
1874@cindex preparing programs for translation
1875
1876@c FIXME: Rewrite (the whole chapter).
1877
1878For the programmer, changes to the C source code fall into three
1879categories.  First, you have to make the localization functions
1880known to all modules needing message translation.  Second, you should
1881properly trigger the operation of GNU @code{gettext} when the program
1882initializes, usually from the @code{main} function.  Last, you should
1883identify, adjust and mark all constant strings in your program
1884needing translation.
1885
1886@menu
1887* Importing::                   Importing the @code{gettext} declaration
1888* Triggering::                  Triggering @code{gettext} Operations
1889* Preparing Strings::           Preparing Translatable Strings
1890* Mark Keywords::               How Marks Appear in Sources
1891* Marking::                     Marking Translatable Strings
1892* c-format Flag::               Telling something about the following string
1893* Special cases::               Special Cases of Translatable Strings
1894* Bug Report Address::          Letting Users Report Translation Bugs
1895* Names::                       Marking Proper Names for Translation
1896* Libraries::                   Preparing Library Sources
1897@end menu
1898
1899@node Importing
1900@section Importing the @code{gettext} declaration
1901
1902Presuming that your set of programs, or package, has been adjusted
1903so all needed GNU @code{gettext} files are available, and your
1904@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1905having translated C strings should contain the line:
1906
1907@cindex include file @file{libintl.h}
1908@example
1909#include <libintl.h>
1910@end example
1911
1912Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1913calls with a format string that could be a translated C string (even if
1914the C string comes from a different C module) should contain the line:
1915
1916@example
1917#include <libintl.h>
1918@end example
1919
1920@node Triggering
1921@section Triggering @code{gettext} Operations
1922
1923@cindex initialization
1924The initialization of locale data should be done with more or less
1925the same code in every program, as demonstrated below:
1926
1927@example
1928@group
1929int
1930main (int argc, char *argv[])
1931@{
1932  @dots{}
1933  setlocale (LC_ALL, "");
1934  bindtextdomain (PACKAGE, LOCALEDIR);
1935  textdomain (PACKAGE);
1936  @dots{}
1937@}
1938@end group
1939@end example
1940
1941@var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1942@file{config.h} or by the Makefile.  For now consult the @code{gettext}
1943or @code{hello} sources for more information.
1944
1945@cindex locale category, LC_ALL
1946@cindex locale category, LC_CTYPE
1947The use of @code{LC_ALL} might not be appropriate for you.
1948@code{LC_ALL} includes all locale categories and especially
1949@code{LC_CTYPE}.  This latter category is responsible for determining
1950character classes with the @code{isalnum} etc. functions from
1951@file{ctype.h} which could especially for programs, which process some
1952kind of input language, be wrong.  For example this would mean that a
1953source code using the @,{c} (c-cedilla character) is runnable in
1954France but not in the U.S.
1955
1956Some systems also have problems with parsing numbers using the
1957@code{scanf} functions if an other but the @code{LC_ALL} locale category is
1958used.  The standards say that additional formats but the one known in the
1959@code{"C"} locale might be recognized.  But some systems seem to reject
1960numbers in the @code{"C"} locale format.  In some situation, it might
1961also be a problem with the notation itself which makes it impossible to
1962recognize whether the number is in the @code{"C"} locale or the local
1963format.  This can happen if thousands separator characters are used.
1964Some locales define this character according to the national
1965conventions to @code{'.'} which is the same character used in the
1966@code{"C"} locale to denote the decimal point.
1967
1968So it is sometimes necessary to replace the @code{LC_ALL} line in the
1969code above by a sequence of @code{setlocale} lines
1970
1971@example
1972@group
1973@{
1974  @dots{}
1975  setlocale (LC_CTYPE, "");
1976  setlocale (LC_MESSAGES, "");
1977  @dots{}
1978@}
1979@end group
1980@end example
1981
1982@cindex locale category, LC_CTYPE
1983@cindex locale category, LC_COLLATE
1984@cindex locale category, LC_MONETARY
1985@cindex locale category, LC_NUMERIC
1986@cindex locale category, LC_TIME
1987@cindex locale category, LC_MESSAGES
1988@cindex locale category, LC_RESPONSES
1989@noindent
1990On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1991@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1992@code{LC_NUMERIC}, and @code{LC_TIME} are available.  On some systems
1993which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1994a substitute for it is defined in GNU gettext's @code{<libintl.h>} and
1995in GNU gnulib's @code{<locale.h>}.
1996
1997Note that changing the @code{LC_CTYPE} also affects the functions
1998declared in the @code{<ctype.h>} standard header and some functions
1999declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
2000If this is not
2001desirable in your application (for example in a compiler's parser),
2002you can use a set of substitute functions which hardwire the C locale,
2003such as found in the modules @samp{c-ctype}, @samp{c-strcase},
2004@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
2005source distribution.
2006
2007It is also possible to switch the locale forth and back between the
2008environment dependent locale and the C locale, but this approach is
2009normally avoided because a @code{setlocale} call is expensive,
2010because it is tedious to determine the places where a locale switch
2011is needed in a large program's source, and because switching a locale
2012is not multithread-safe.
2013
2014@node Preparing Strings
2015@section Preparing Translatable Strings
2016
2017@cindex marking strings, preparations
2018Before strings can be marked for translations, they sometimes need to
2019be adjusted.  Usually preparing a string for translation is done right
2020before marking it, during the marking phase which is described in the
2021next sections.  What you have to keep in mind while doing that is the
2022following.
2023
2024@itemize @bullet
2025@item
2026Decent English style.
2027
2028@item
2029Entire sentences.
2030
2031@item
2032Split at paragraphs.
2033
2034@item
2035Use format strings instead of string concatenation.
2036
2037@item
2038Use placeholders in format strings instead of embedded URLs.
2039
2040@item
2041Avoid unusual markup and unusual control characters.
2042@end itemize
2043
2044@noindent
2045Let's look at some examples of these guidelines.
2046
2047@subheading Decent English style
2048
2049@cindex style
2050Translatable strings should be in good English style.  If slang language
2051with abbreviations and shortcuts is used, often translators will not
2052understand the message and will produce very inappropriate translations.
2053
2054@example
2055"%s: is parameter\n"
2056@end example
2057
2058@noindent
2059This is nearly untranslatable: Is the displayed item @emph{a} parameter or
2060@emph{the} parameter?
2061
2062@example
2063"No match"
2064@end example
2065
2066@noindent
2067The ambiguity in this message makes it unintelligible: Is the program
2068attempting to set something on fire? Does it mean "The given object does
2069not match the template"? Does it mean "The template does not fit for any
2070of the objects"?
2071
2072@cindex ambiguities
2073In both cases, adding more words to the message will help both the
2074translator and the English speaking user.
2075
2076@subheading Entire sentences
2077
2078@cindex sentences
2079Translatable strings should be entire sentences.  It is often not possible
2080to translate single verbs or adjectives in a substitutable way.
2081
2082@example
2083printf ("File %s is %s protected", filename, rw ? "write" : "read");
2084@end example
2085
2086@noindent
2087Most translators will not look at the source and will thus only see the
2088string @code{"File %s is %s protected"}, which is unintelligible.  Change
2089this to
2090
2091@example
2092printf (rw ? "File %s is write protected" : "File %s is read protected",
2093        filename);
2094@end example
2095
2096@noindent
2097This way the translator will not only understand the message, she will
2098also be able to find the appropriate grammatical construction.  A French
2099translator for example translates "write protected" like "protected
2100against writing".
2101
2102Entire sentences are also important because in many languages, the
2103declination of some word in a sentence depends on the gender or the
2104number (singular/plural) of another part of the sentence.  There are
2105usually more interdependencies between words than in English.  The
2106consequence is that asking a translator to translate two half-sentences
2107and then combining these two half-sentences through dumb string concatenation
2108will not work, for many languages, even though it would work for English.
2109That's why translators need to handle entire sentences.
2110
2111Often sentences don't fit into a single line.  If a sentence is output
2112using two subsequent @code{printf} statements, like this
2113
2114@example
2115printf ("Locale charset \"%s\" is different from\n", lcharset);
2116printf ("input file charset \"%s\".\n", fcharset);
2117@end example
2118
2119@noindent
2120the translator would have to translate two half sentences, but nothing
2121in the POT file would tell her that the two half sentences belong together.
2122It is necessary to merge the two @code{printf} statements so that the
2123translator can handle the entire sentence at once and decide at which
2124place to insert a line break in the translation (if at all):
2125
2126@example
2127printf ("Locale charset \"%s\" is different from\n\
2128input file charset \"%s\".\n", lcharset, fcharset);
2129@end example
2130
2131You may now ask: how about two or more adjacent sentences? Like in this case:
2132
2133@example
2134puts ("Apollo 13 scenario: Stack overflow handling failed.");
2135puts ("On the next stack overflow we will crash!!!");
2136@end example
2137
2138@noindent
2139Should these two statements merged into a single one? I would recommend to
2140merge them if the two sentences are related to each other, because then it
2141makes it easier for the translator to understand and translate both.  On
2142the other hand, if one of the two messages is a stereotypic one, occurring
2143in other places as well, you will do a favour to the translator by not
2144merging the two.  (Identical messages occurring in several places are
2145combined by xgettext, so the translator has to handle them once only.)
2146
2147@subheading Split at paragraphs
2148
2149@cindex paragraphs
2150Translatable strings should be limited to one paragraph; don't let a
2151single message be longer than ten lines.  The reason is that when the
2152translatable string changes, the translator is faced with the task of
2153updating the entire translated string.  Maybe only a single word will
2154have changed in the English string, but the translator doesn't see that
2155(with the current translation tools), therefore she has to proofread
2156the entire message.
2157
2158@cindex help option
2159Many GNU programs have a @samp{--help} output that extends over several
2160screen pages.  It is a courtesy towards the translators to split such a
2161message into several ones of five to ten lines each.  While doing that,
2162you can also attempt to split the documented options into groups,
2163such as the input options, the output options, and the informative
2164output options.  This will help every user to find the option he is
2165looking for.
2166
2167@subheading No string concatenation
2168
2169@cindex string concatenation
2170@cindex concatenation of strings
2171Hardcoded string concatenation is sometimes used to construct English
2172strings:
2173
2174@example
2175strcpy (s, "Replace ");
2176strcat (s, object1);
2177strcat (s, " with ");
2178strcat (s, object2);
2179strcat (s, "?");
2180@end example
2181
2182@noindent
2183In order to present to the translator only entire sentences, and also
2184because in some languages the translator might want to swap the order
2185of @code{object1} and @code{object2}, it is necessary to change this
2186to use a format string:
2187
2188@example
2189sprintf (s, "Replace %s with %s?", object1, object2);
2190@end example
2191
2192@cindex @code{inttypes.h}
2193A similar case is compile time concatenation of strings.  The ISO C 99
2194include file @code{<inttypes.h>} contains a macro @code{PRId64} that
2195can be used as a formatting directive for outputting an @samp{int64_t}
2196integer through @code{printf}.  It expands to a constant string, usually
2197"d" or "ld" or "lld" or something like this, depending on the platform.
2198Assume you have code like
2199
2200@example
2201printf ("The amount is %0" PRId64 "\n", number);
2202@end example
2203
2204@noindent
2205The @code{gettext} tools and library have special support for these
2206@code{<inttypes.h>} macros.  You can therefore simply write
2207
2208@example
2209printf (gettext ("The amount is %0" PRId64 "\n"), number);
2210@end example
2211
2212@noindent
2213The PO file will contain the string "The amount is %0<PRId64>\n".
2214The translators will provide a translation containing "%0<PRId64>"
2215as well, and at runtime the @code{gettext} function's result will
2216contain the appropriate constant string, "d" or "ld" or "lld".
2217
2218This works only for the predefined @code{<inttypes.h>} macros.  If
2219you have defined your own similar macros, let's say @samp{MYPRId64},
2220that are not known to @code{xgettext}, the solution for this problem
2221is to change the code like this:
2222
2223@example
2224char buf1[100];
2225sprintf (buf1, "%0" MYPRId64, number);
2226printf (gettext ("The amount is %s\n"), buf1);
2227@end example
2228
2229This means, you put the platform dependent code in one statement, and the
2230internationalization code in a different statement.  Note that a buffer length
2231of 100 is safe, because all available hardware integer types are limited to
2232128 bits, and to print a 128 bit integer one needs at most 54 characters,
2233regardless whether in decimal, octal or hexadecimal.
2234
2235@cindex Java, string concatenation
2236@cindex C#, string concatenation
2237All this applies to other programming languages as well.  For example, in
2238Java and C#, string concatenation is very frequently used, because it is a
2239compiler built-in operator.  Like in C, in Java, you would change
2240
2241@example
2242System.out.println("Replace "+object1+" with "+object2+"?");
2243@end example
2244
2245@noindent
2246into a statement involving a format string:
2247
2248@example
2249System.out.println(
2250    MessageFormat.format("Replace @{0@} with @{1@}?",
2251                         new Object[] @{ object1, object2 @}));
2252@end example
2253
2254@noindent
2255Similarly, in C#, you would change
2256
2257@example
2258Console.WriteLine("Replace "+object1+" with "+object2+"?");
2259@end example
2260
2261@noindent
2262into a statement involving a format string:
2263
2264@example
2265Console.WriteLine(
2266    String.Format("Replace @{0@} with @{1@}?", object1, object2));
2267@end example
2268
2269@subheading No embedded URLs
2270
2271It is good to not embed URLs in translatable strings, for several reasons:
2272@itemize @bullet
2273@item
2274It avoids possible mistakes during copy and paste.
2275@item
2276Translators cannot translate the URLs or, by mistake, use the URLs from
2277other packages that are present in their compendium.
2278@item
2279When the URLs change, translators don't need to revisit the translation
2280of the string.
2281@end itemize
2282
2283The same holds for email addresses.
2284
2285So, you would change
2286
2287@example
2288fputs (_("GNU GPL version 3 <https://gnu.org/licenses/gpl.html>\n"),
2289       stream);
2290@end example
2291
2292@noindent
2293to
2294
2295@example
2296fprintf (stream, _("GNU GPL version 3 <%s>\n"),
2297         "https://gnu.org/licenses/gpl.html");
2298@end example
2299
2300@subheading No unusual markup
2301
2302@cindex markup
2303@cindex control characters
2304Unusual markup or control characters should not be used in translatable
2305strings.  Translators will likely not understand the particular meaning
2306of the markup or control characters.
2307
2308For example, if you have a convention that @samp{|} delimits the
2309left-hand and right-hand part of some GUI elements, translators will
2310often not understand it without specific comments.  It might be
2311better to have the translator translate the left-hand and right-hand
2312part separately.
2313
2314Another example is the @samp{argp} convention to use a single @samp{\v}
2315(vertical tab) control character to delimit two sections inside a
2316string.  This is flawed.  Some translators may convert it to a simple
2317newline, some to blank lines.  With some PO file editors it may not be
2318easy to even enter a vertical tab control character.  So, you cannot
2319be sure that the translation will contain a @samp{\v} character, at the
2320corresponding position.  The solution is, again, to let the translator
2321translate two separate strings and combine at run-time the two translated
2322strings with the @samp{\v} required by the convention.
2323
2324HTML markup, however, is common enough that it's probably ok to use in
2325translatable strings.  But please bear in mind that the GNU gettext tools
2326don't verify that the translations are well-formed HTML.
2327
2328@node Mark Keywords
2329@section How Marks Appear in Sources
2330@cindex marking strings that require translation
2331
2332All strings requiring translation should be marked in the C sources.  Marking
2333is done in such a way that each translatable string appears to be
2334the sole argument of some function or preprocessor macro.  There are
2335only a few such possible functions or macros meant for translation,
2336and their names are said to be marking keywords.  The marking is
2337attached to strings themselves, rather than to what we do with them.
2338This approach has more uses.  A blatant example is an error message
2339produced by formatting.  The format string needs translation, as
2340well as some strings inserted through some @samp{%s} specification
2341in the format, while the result from @code{sprintf} may have so many
2342different instances that it is impractical to list them all in some
2343@samp{error_string_out()} routine, say.
2344
2345This marking operation has two goals.  The first goal of marking
2346is for triggering the retrieval of the translation, at run time.
2347The keyword is possibly resolved into a routine able to dynamically
2348return the proper translation, as far as possible or wanted, for the
2349argument string.  Most localizable strings are found in executable
2350positions, that is, attached to variables or given as parameters to
2351functions.  But this is not universal usage, and some translatable
2352strings appear in structured initializations.  @xref{Special cases}.
2353
2354The second goal of the marking operation is to help @code{xgettext}
2355at properly extracting all translatable strings when it scans a set
2356of program sources and produces PO file templates.
2357
2358The canonical keyword for marking translatable strings is
2359@samp{gettext}, it gave its name to the whole GNU @code{gettext}
2360package.  For packages making only light use of the @samp{gettext}
2361keyword, macro or function, it is easily used @emph{as is}.  However,
2362for packages using the @code{gettext} interface more heavily, it
2363is usually more convenient to give the main keyword a shorter, less
2364obtrusive name.  Indeed, the keyword might appear on a lot of strings
2365all over the package, and programmers usually do not want nor need
2366their program sources to remind them forcefully, all the time, that they
2367are internationalized.  Further, a long keyword has the disadvantage
2368of using more horizontal space, forcing more indentation work on
2369sources for those trying to keep them within 79 or 80 columns.
2370
2371@cindex @code{_}, a macro to mark strings for translation
2372Many packages use @samp{_} (a simple underline) as a keyword,
2373and write @samp{_("Translatable string")} instead of @samp{gettext
2374("Translatable string")}.  Further, the coding rule, from GNU standards,
2375wanting that there is a space between the keyword and the opening
2376parenthesis is relaxed, in practice, for this particular usage.
2377So, the textual overhead per translatable string is reduced to
2378only three characters: the underline and the two parentheses.
2379However, even if GNU @code{gettext} uses this convention internally,
2380it does not offer it officially.  The real, genuine keyword is truly
2381@samp{gettext} indeed.  It is fairly easy for those wanting to use
2382@samp{_} instead of @samp{gettext} to declare:
2383
2384@example
2385#include <libintl.h>
2386#define _(String) gettext (String)
2387@end example
2388
2389@noindent
2390instead of merely using @samp{#include <libintl.h>}.
2391
2392The marking keywords @samp{gettext} and @samp{_} take the translatable
2393string as sole argument.  It is also possible to define marking functions
2394that take it at another argument position.  It is even possible to make
2395the marked argument position depend on the total number of arguments of
2396the function call; this is useful in C++.  All this is achieved using
2397@code{xgettext}'s @samp{--keyword} option.  How to pass such an option
2398to @code{xgettext}, assuming that @code{gettextize} is used, is described
2399in @ref{po/Makevars} and @ref{AM_XGETTEXT_OPTION}.
2400
2401Note also that long strings can be split across lines, into multiple
2402adjacent string tokens.  Automatic string concatenation is performed
2403at compile time according to ISO C and ISO C++; @code{xgettext} also
2404supports this syntax.
2405
2406Later on, the maintenance is relatively easy.  If, as a programmer,
2407you add or modify a string, you will have to ask yourself if the
2408new or altered string requires translation, and include it within
2409@samp{_()} if you think it should be translated.  For example, @samp{"%s"}
2410is an example of string @emph{not} requiring translation.  But
2411@samp{"%s: %d"} @emph{does} require translation, because in French, unlike
2412in English, it's customary to put a space before a colon.
2413
2414@node Marking
2415@section Marking Translatable Strings
2416@emindex marking strings for translation
2417
2418In PO mode, one set of features is meant more for the programmer than
2419for the translator, and allows him to interactively mark which strings,
2420in a set of program sources, are translatable, and which are not.
2421Even if it is a fairly easy job for a programmer to find and mark
2422such strings by other means, using any editor of his choice, PO mode
2423makes this work more comfortable.  Further, this gives translators
2424who feel a little like programmers, or programmers who feel a little
2425like translators, a tool letting them work at marking translatable
2426strings in the program sources, while simultaneously producing a set of
2427translation in some language, for the package being internationalized.
2428
2429@emindex @code{etags}, using for marking strings
2430The set of program sources, targeted by the PO mode commands describe
2431here, should have an Emacs tags table constructed for your project,
2432prior to using these PO file commands.  This is easy to do.  In any
2433shell window, change the directory to the root of your project, then
2434execute a command resembling:
2435
2436@example
2437etags src/*.[hc] lib/*.[hc]
2438@end example
2439
2440@noindent
2441presuming here you want to process all @file{.h} and @file{.c} files
2442from the @file{src/} and @file{lib/} directories.  This command will
2443explore all said files and create a @file{TAGS} file in your root
2444directory, somewhat summarizing the contents using a special file
2445format Emacs can understand.
2446
2447@emindex @file{TAGS}, and marking translatable strings
2448For packages following the GNU coding standards, there is
2449a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2450all directories and for all files containing source code.
2451
2452Once your @file{TAGS} file is ready, the following commands assist
2453the programmer at marking translatable strings in his set of sources.
2454But these commands are necessarily driven from within a PO file
2455window, and it is likely that you do not even have such a PO file yet.
2456This is not a problem at all, as you may safely open a new, empty PO
2457file, mainly for using these commands.  This empty PO file will slowly
2458fill in while you mark strings as translatable in your program sources.
2459
2460@table @kbd
2461@item ,
2462@efindex ,@r{, PO Mode command}
2463Search through program sources for a string which looks like a
2464candidate for translation (@code{po-tags-search}).
2465
2466@item M-,
2467@efindex M-,@r{, PO Mode command}
2468Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2469
2470@item M-.
2471@efindex M-.@r{, PO Mode command}
2472Mark the last string found with a keyword taken from a set of possible
2473keywords.  This command with a prefix allows some management of these
2474keywords (@code{po-select-mark-and-mark}).
2475
2476@end table
2477
2478@efindex po-tags-search@r{, PO Mode command}
2479The @kbd{,} (@code{po-tags-search}) command searches for the next
2480occurrence of a string which looks like a possible candidate for
2481translation, and displays the program source in another Emacs window,
2482positioned in such a way that the string is near the top of this other
2483window.  If the string is too big to fit whole in this window, it is
2484positioned so only its end is shown.  In any case, the cursor
2485is left in the PO file window.  If the shown string would be better
2486presented differently in different native languages, you may mark it
2487using @kbd{M-,} or @kbd{M-.}.  Otherwise, you might rather ignore it
2488and skip to the next string by merely repeating the @kbd{,} command.
2489
2490A string is a good candidate for translation if it contains a sequence
2491of three or more letters.  A string containing at most two letters in
2492a row will be considered as a candidate if it has more letters than
2493non-letters.  The command disregards strings containing no letters,
2494or isolated letters only.  It also disregards strings within comments,
2495or strings already marked with some keyword PO mode knows (see below).
2496
2497If you have never told Emacs about some @file{TAGS} file to use, the
2498command will request that you specify one from the minibuffer, the
2499first time you use the command.  You may later change your @file{TAGS}
2500file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2501which will ask you to name the precise @file{TAGS} file you want
2502to use.  @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2503
2504Each time you use the @kbd{,} command, the search resumes from where it was
2505left by the previous search, and goes through all program sources,
2506obeying the @file{TAGS} file, until all sources have been processed.
2507However, by giving a prefix argument to the command @w{(@kbd{C-u
2508,})}, you may request that the search be restarted all over again
2509from the first program source; but in this case, strings that you
2510recently marked as translatable will be automatically skipped.
2511
2512Using this @kbd{,} command does not prevent using of other regular
2513Emacs tags commands.  For example, regular @code{tags-search} or
2514@code{tags-query-replace} commands may be used without disrupting the
2515independent @kbd{,} search sequence.  However, as implemented, the
2516@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2517prefix) might also reinitialize the regular Emacs tags searching to the
2518first tags file, this reinitialization might be considered spurious.
2519
2520@efindex po-mark-translatable@r{, PO Mode command}
2521@efindex po-select-mark-and-mark@r{, PO Mode command}
2522The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2523recently found string with the @samp{_} keyword.  The @kbd{M-.}
2524(@code{po-select-mark-and-mark}) command will request that you type
2525one keyword from the minibuffer and use that keyword for marking
2526the string.  Both commands will automatically create a new PO file
2527untranslated entry for the string being marked, and make it the
2528current entry (making it easy for you to immediately proceed to its
2529translation, if you feel like doing it right away).  It is possible
2530that the modifications made to the program source by @kbd{M-,} or
2531@kbd{M-.} render some source line longer than 80 columns, forcing you
2532to break and re-indent this line differently.  You may use the @kbd{O}
2533command from PO mode, or any other window changing command from
2534Emacs, to break out into the program source window, and do any
2535needed adjustments.  You will have to use some regular Emacs command
2536to return the cursor to the PO file window, if you want command
2537@kbd{,} for the next string, say.
2538
2539The @kbd{M-.} command has a few built-in speedups, so you do not
2540have to explicitly type all keywords all the time.  The first such
2541speedup is that you are presented with a @emph{preferred} keyword,
2542which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2543The second speedup is that you may type any non-ambiguous prefix of the
2544keyword you really mean, and the command will complete it automatically
2545for you.  This also means that PO mode has to @emph{know} all
2546your possible keywords, and that it will not accept mistyped keywords.
2547
2548If you reply @kbd{?} to the keyword request, the command gives a
2549list of all known keywords, from which you may choose.  When the
2550command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2551updating any program source or PO file buffer, and does some simple
2552keyword management instead.  In this case, the command asks for a
2553keyword, written in full, which becomes a new allowed keyword for
2554later @kbd{M-.} commands.  Moreover, this new keyword automatically
2555becomes the @emph{preferred} keyword for later commands.  By typing
2556an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2557changes the @emph{preferred} keyword and does nothing more.
2558
2559All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2560when scanning for strings, and strings already marked by any of those
2561known keywords are automatically skipped.  If many PO files are opened
2562simultaneously, each one has its own independent set of known keywords.
2563There is no provision in PO mode, currently, for deleting a known
2564keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2565it afresh.  When a PO file is newly brought up in an Emacs window, only
2566@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2567is preferred for the @kbd{M-.} command.  In fact, this is not useful to
2568prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2569
2570@node c-format Flag
2571@section Special Comments preceding Keywords
2572
2573@c FIXME document c-format and no-c-format.
2574
2575@cindex format strings
2576In C programs strings are often used within calls of functions from the
2577@code{printf} family.  The special thing about these format strings is
2578that they can contain format specifiers introduced with @kbd{%}.  Assume
2579we have the code
2580
2581@example
2582printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2583@end example
2584
2585@noindent
2586A possible German translation for the above string might be:
2587
2588@example
2589"%d Zeichen lang ist die Zeichenkette `%s'"
2590@end example
2591
2592A C programmer, even if he cannot speak German, will recognize that
2593there is something wrong here.  The order of the two format specifiers
2594is changed but of course the arguments in the @code{printf} don't have.
2595This will most probably lead to problems because now the length of the
2596string is regarded as the address.
2597
2598To prevent errors at runtime caused by translations, the @code{msgfmt}
2599tool can check statically whether the arguments in the original and the
2600translation string match in type and number.  If this is not the case
2601and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2602will give an error and refuse to produce a MO file.  Thus consistent
2603use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2604problems at runtime.
2605
2606@noindent
2607If the word order in the above German translation would be correct one
2608would have to write
2609
2610@example
2611"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2612@end example
2613
2614@noindent
2615The routines in @code{msgfmt} know about this special notation.
2616
2617Because not all strings in a program will be format strings, it is not
2618useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2619This might cause problems because the string might contain what looks
2620like a format specifier, but the string is not used in @code{printf}.
2621
2622Therefore @code{xgettext} adds a special tag to those messages it
2623thinks might be a format string.  There is no absolute rule for this,
2624only a heuristic.  In the @file{.po} file the entry is marked using the
2625@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2626
2627@kwindex c-format@r{, and @code{xgettext}}
2628@kwindex no-c-format@r{, and @code{xgettext}}
2629The careful reader now might say that this again can cause problems.
2630The heuristic might guess it wrong.  This is true and therefore
2631@code{xgettext} knows about a special kind of comment which lets
2632the programmer take over the decision.  If in the same line as or
2633the immediately preceding line to the @code{gettext} keyword
2634the @code{xgettext} program finds a comment containing the words
2635@code{xgettext:c-format}, it will mark the string in any case with
2636the @code{c-format} flag.  This kind of comment should be used when
2637@code{xgettext} does not recognize the string as a format string but
2638it really is one and it should be tested.  Please note that when the
2639comment is in the same line as the @code{gettext} keyword, it must be
2640before the string to be translated.
2641
2642This situation happens quite often.  The @code{printf} function is often
2643called with strings which do not contain a format specifier.  Of course
2644one would normally use @code{fputs} but it does happen.  In this case
2645@code{xgettext} does not recognize this as a format string but what
2646happens if the translation introduces a valid format specifier?  The
2647@code{printf} function will try to access one of the parameters but none
2648exists because the original code does not pass any parameters.
2649
2650@code{xgettext} of course could make a wrong decision the other way
2651round, i.e.@: a string marked as a format string actually is not a format
2652string.  In this case the @code{msgfmt} might give too many warnings and
2653would prevent translating the @file{.po} file.  The method to prevent
2654this wrong decision is similar to the one used above, only the comment
2655to use must contain the string @code{xgettext:no-c-format}.
2656
2657If a string is marked with @code{c-format} and this is not correct the
2658user can find out who is responsible for the decision.  See
2659@ref{xgettext Invocation} to see how the @code{--debug} option can be
2660used for solving this problem.
2661
2662@node Special cases
2663@section Special Cases of Translatable Strings
2664
2665@cindex marking string initializers
2666The attentive reader might now point out that it is not always possible
2667to mark translatable string with @code{gettext} or something like this.
2668Consider the following case:
2669
2670@example
2671@group
2672@{
2673  static const char *messages[] = @{
2674    "some very meaningful message",
2675    "and another one"
2676  @};
2677  const char *string;
2678  @dots{}
2679  string
2680    = index > 1 ? "a default message" : messages[index];
2681
2682  fputs (string);
2683  @dots{}
2684@}
2685@end group
2686@end example
2687
2688While it is no problem to mark the string @code{"a default message"} it
2689is not possible to mark the string initializers for @code{messages}.
2690What is to be done?  We have to fulfill two tasks.  First we have to mark the
2691strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2692can find them, and second we have to translate the string at runtime
2693before printing them.
2694
2695The first task can be fulfilled by creating a new keyword, which names a
2696no-op.  For the second we have to mark all access points to a string
2697from the array.  So one solution can look like this:
2698
2699@example
2700@group
2701#define gettext_noop(String) String
2702
2703@{
2704  static const char *messages[] = @{
2705    gettext_noop ("some very meaningful message"),
2706    gettext_noop ("and another one")
2707  @};
2708  const char *string;
2709  @dots{}
2710  string
2711    = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2712
2713  fputs (string);
2714  @dots{}
2715@}
2716@end group
2717@end example
2718
2719Please convince yourself that the string which is written by
2720@code{fputs} is translated in any case.  How to get @code{xgettext} know
2721the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2722Invocation}.
2723
2724The above is of course not the only solution.  You could also come along
2725with the following one:
2726
2727@example
2728@group
2729#define gettext_noop(String) String
2730
2731@{
2732  static const char *messages[] = @{
2733    gettext_noop ("some very meaningful message"),
2734    gettext_noop ("and another one")
2735  @};
2736  const char *string;
2737  @dots{}
2738  string
2739    = index > 1 ? gettext_noop ("a default message") : messages[index];
2740
2741  fputs (gettext (string));
2742  @dots{}
2743@}
2744@end group
2745@end example
2746
2747But this has a drawback.  The programmer has to take care that
2748he uses @code{gettext_noop} for the string @code{"a default message"}.
2749A use of @code{gettext} could have in rare cases unpredictable results.
2750
2751One advantage is that you need not make control flow analysis to make
2752sure the output is really translated in any case.  But this analysis is
2753generally not very difficult.  If it should be in any situation you can
2754use this second method in this situation.
2755
2756@node Bug Report Address
2757@section Letting Users Report Translation Bugs
2758
2759Code sometimes has bugs, but translations sometimes have bugs too.  The
2760users need to be able to report them.  Reporting translation bugs to the
2761programmer or maintainer of a package is not very useful, since the
2762maintainer must never change a translation, except on behalf of the
2763translator.  Hence the translation bugs must be reported to the
2764translators.
2765
2766Here is a way to organize this so that the maintainer does not need to
2767forward translation bug reports, nor even keep a list of the addresses of
2768the translators or their translation teams.
2769
2770Every program has a place where is shows the bug report address.  For
2771GNU programs, it is the code which handles the ``--help'' option,
2772typically in a function called ``usage''.  In this place, instruct the
2773translator to add her own bug reporting address.  For example, if that
2774code has a statement
2775
2776@example
2777@group
2778printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2779@end group
2780@end example
2781
2782you can add some translator instructions like this:
2783
2784@example
2785@group
2786/* TRANSLATORS: The placeholder indicates the bug-reporting address
2787   for this package.  Please add _another line_ saying
2788   "Report translation bugs to <...>\n" with the address for translation
2789   bugs (typically your translation team's web or email address).  */
2790printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2791@end group
2792@end example
2793
2794These will be extracted by @samp{xgettext}, leading to a .pot file that
2795contains this:
2796
2797@example
2798@group
2799#. TRANSLATORS: The placeholder indicates the bug-reporting address
2800#. for this package.  Please add _another line_ saying
2801#. "Report translation bugs to <...>\n" with the address for translation
2802#. bugs (typically your translation team's web or email address).
2803#: src/hello.c:178
2804#, c-format
2805msgid "Report bugs to <%s>.\n"
2806msgstr ""
2807@end group
2808@end example
2809
2810@node Names
2811@section Marking Proper Names for Translation
2812
2813Should names of persons, cities, locations etc. be marked for translation
2814or not?  People who only know languages that can be written with Latin
2815letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2816because names usually do not change when transported between these languages.
2817However, in general when translating from one script to another, names
2818are translated too, usually phonetically or by transliteration.  For
2819example, Russian or Greek names are converted to the Latin alphabet when
2820being translated to English, and English or French names are converted
2821to the Katakana script when being translated to Japanese.  This is
2822necessary because the speakers of the target language in general cannot
2823read the script the name is originally written in.
2824
2825As a programmer, you should therefore make sure that names are marked
2826for translation, with a special comment telling the translators that it
2827is a proper name and how to pronounce it.  In its simple form, it looks
2828like this:
2829
2830@example
2831@group
2832printf (_("Written by %s.\n"),
2833        /* TRANSLATORS: This is a proper name.  See the gettext
2834           manual, section Names.  Note this is actually a non-ASCII
2835           name: The first name is (with Unicode escapes)
2836           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2837           Pronunciation is like "fraa-swa pee-nar".  */
2838        _("Francois Pinard"));
2839@end group
2840@end example
2841
2842@noindent
2843The GNU gnulib library offers a module @samp{propername}
2844(@url{https://www.gnu.org/software/gnulib/MODULES.html#module=propername})
2845which takes care to automatically append the original name, in parentheses,
2846to the translated name.  For names that cannot be written in ASCII, it
2847also frees the translator from the task of entering the appropriate non-ASCII
2848characters if no script change is needed.  In this more comfortable form,
2849it looks like this:
2850
2851@example
2852@group
2853printf (_("Written by %s and %s.\n"),
2854        proper_name ("Ulrich Drepper"),
2855        /* TRANSLATORS: This is a proper name.  See the gettext
2856           manual, section Names.  Note this is actually a non-ASCII
2857           name: The first name is (with Unicode escapes)
2858           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2859           Pronunciation is like "fraa-swa pee-nar".  */
2860        proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));
2861@end group
2862@end example
2863
2864@noindent
2865You can also write the original name directly in Unicode (rather than with
2866Unicode escapes or HTML entities) and denote the pronunciation using the
2867International Phonetic Alphabet (see
2868@url{https://en.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2869
2870As a translator, you should use some care when translating names, because
2871it is frustrating if people see their names mutilated or distorted.
2872
2873If your language uses the Latin script, all you need to do is to reproduce
2874the name as perfectly as you can within the usual character set of your
2875language.  In this particular case, this means to provide a translation
2876containing the c-cedilla character.  If your language uses a different
2877script and the people speaking it don't usually read Latin words, it means
2878transliteration.  If the programmer used the simple case, you should still
2879give, in parentheses, the original writing of the name -- for the sake of
2880the people that do read the Latin script.  If the programmer used the
2881@samp{propername} module mentioned above, you don't need to give the original
2882writing of the name in parentheses, because the program will already do so.
2883Here is an example, using Greek as the target script:
2884
2885@example
2886@group
2887#. This is a proper name.  See the gettext
2888#. manual, section Names.  Note this is actually a non-ASCII
2889#. name: The first name is (with Unicode escapes)
2890#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2891#. Pronunciation is like "fraa-swa pee-nar".
2892msgid "Francois Pinard"
2893msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2894       " (Francois Pinard)"
2895@end group
2896@end example
2897
2898Because translation of names is such a sensitive domain, it is a good
2899idea to test your translation before submitting it.
2900
2901@node Libraries
2902@section Preparing Library Sources
2903
2904When you are preparing a library, not a program, for the use of
2905@code{gettext}, only a few details are different.  Here we assume that
2906the library has a translation domain and a POT file of its own.  (If
2907it uses the translation domain and POT file of the main program, then
2908the previous sections apply without changes.)
2909
2910@enumerate
2911@item
2912The library code doesn't call @code{setlocale (LC_ALL, "")}.  It's the
2913responsibility of the main program to set the locale.  The library's
2914documentation should mention this fact, so that developers of programs
2915using the library are aware of it.
2916
2917@item
2918The library code doesn't call @code{textdomain (PACKAGE)}, because it
2919would interfere with the text domain set by the main program.
2920
2921@item
2922The initialization code for a program was
2923
2924@smallexample
2925  setlocale (LC_ALL, "");
2926  bindtextdomain (PACKAGE, LOCALEDIR);
2927  textdomain (PACKAGE);
2928@end smallexample
2929
2930@noindent
2931For a library it is reduced to
2932
2933@smallexample
2934  bindtextdomain (PACKAGE, LOCALEDIR);
2935@end smallexample
2936
2937@noindent
2938If your library's API doesn't already have an initialization function,
2939you need to create one, containing at least the @code{bindtextdomain}
2940invocation.  However, you usually don't need to export and document this
2941initialization function: It is sufficient that all entry points of the
2942library call the initialization function if it hasn't been called before.
2943The typical idiom used to achieve this is a static boolean variable that
2944indicates whether the initialization function has been called. Like this:
2945
2946@example
2947@group
2948static bool libfoo_initialized;
2949
2950static void
2951libfoo_initialize (void)
2952@{
2953  bindtextdomain (PACKAGE, LOCALEDIR);
2954  libfoo_initialized = true;
2955@}
2956
2957/* This function is part of the exported API.  */
2958struct foo *
2959create_foo (...)
2960@{
2961  /* Must ensure the initialization is performed.  */
2962  if (!libfoo_initialized)
2963    libfoo_initialize ();
2964  ...
2965@}
2966
2967/* This function is part of the exported API.  The argument must be
2968   non-NULL and have been created through create_foo().  */
2969int
2970foo_refcount (struct foo *argument)
2971@{
2972  /* No need to invoke the initialization function here, because
2973     create_foo() must already have been called before.  */
2974  ...
2975@}
2976@end group
2977@end example
2978
2979@item
2980The usual declaration of the @samp{_} macro in each source file was
2981
2982@smallexample
2983#include <libintl.h>
2984#define _(String) gettext (String)
2985@end smallexample
2986
2987@noindent
2988for a program.  For a library, which has its own translation domain,
2989it reads like this:
2990
2991@smallexample
2992#include <libintl.h>
2993#define _(String) dgettext (PACKAGE, String)
2994@end smallexample
2995
2996In other words, @code{dgettext} is used instead of @code{gettext}.
2997Similarly, the @code{dngettext} function should be used in place of the
2998@code{ngettext} function.
2999@end enumerate
3000
3001@node Template
3002@chapter Making the PO Template File
3003@cindex PO template file
3004
3005After preparing the sources, the programmer creates a PO template file.
3006This section explains how to use @code{xgettext} for this purpose.
3007
3008@code{xgettext} creates a file named @file{@var{domainname}.po}.  You
3009should then rename it to @file{@var{domainname}.pot}.  (Why doesn't
3010@code{xgettext} create it under the name @file{@var{domainname}.pot}
3011right away?  The answer is: for historical reasons.  When @code{xgettext}
3012was specified, the distinction between a PO file and PO file template
3013was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
3014
3015@c FIXME: Rewrite.
3016
3017@menu
3018* xgettext Invocation::         Invoking the @code{xgettext} Program
3019@end menu
3020
3021@node xgettext Invocation
3022@section Invoking the @code{xgettext} Program
3023
3024@include xgettext.texi
3025
3026@node Creating
3027@chapter Creating a New PO File
3028@cindex creating a new PO file
3029
3030When starting a new translation, the translator creates a file called
3031@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
3032file with modifications in the initial comments (at the beginning of the file)
3033and in the header entry (the first entry, near the beginning of the file).
3034
3035The easiest way to do so is by use of the @samp{msginit} program.
3036For example:
3037
3038@example
3039$ cd @var{PACKAGE}-@var{VERSION}
3040$ cd po
3041$ msginit
3042@end example
3043
3044The alternative way is to do the copy and modifications by hand.
3045To do so, the translator copies @file{@var{package}.pot} to
3046@file{@var{LANG}.po}.  Then she modifies the initial comments and
3047the header entry of this file.
3048
3049@menu
3050* msginit Invocation::          Invoking the @code{msginit} Program
3051* Header Entry::                Filling in the Header Entry
3052@end menu
3053
3054@node msginit Invocation
3055@section Invoking the @code{msginit} Program
3056
3057@include msginit.texi
3058
3059@node Header Entry
3060@section Filling in the Header Entry
3061@cindex header entry of a PO file
3062
3063The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
3064"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
3065information.  This can be done in any text editor; if Emacs is used
3066and it switched to PO mode automatically (because it has recognized
3067the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
3068
3069Modifying the header entry can already be done using PO mode: in Emacs,
3070type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
3071entry.  You should fill in the following fields.
3072
3073@table @asis
3074@item Project-Id-Version
3075This is the name and version of the package.  Fill it in if it has not
3076already been filled in by @code{xgettext}.
3077
3078@item Report-Msgid-Bugs-To
3079This has already been filled in by @code{xgettext}.  It contains an email
3080address or URL where you can report bugs in the untranslated strings:
3081
3082@itemize -
3083@item Strings which are not entire sentences, see the maintainer guidelines
3084in @ref{Preparing Strings}.
3085@item Strings which use unclear terms or require additional context to be
3086understood.
3087@item Strings which make invalid assumptions about notation of date, time or
3088money.
3089@item Pluralisation problems.
3090@item Incorrect English spelling.
3091@item Incorrect formatting.
3092@end itemize
3093
3094@item POT-Creation-Date
3095This has already been filled in by @code{xgettext}.
3096
3097@item PO-Revision-Date
3098You don't need to fill this in.  It will be filled by the PO file editor
3099when you save the file.
3100
3101@item Last-Translator
3102Fill in your name and email address (without double quotes).
3103
3104@item Language-Team
3105Fill in the English name of the language, and the email address or
3106homepage URL of the language team you are part of.
3107
3108Before starting a translation, it is a good idea to get in touch with
3109your translation team, not only to make sure you don't do duplicated work,
3110but also to coordinate difficult linguistic issues.
3111
3112@cindex list of translation teams, where to find
3113In the Free Translation Project, each translation team has its own mailing
3114list.  The up-to-date list of teams can be found at the Free Translation
3115Project's homepage, @uref{https://translationproject.org/}, in the "Teams"
3116area.
3117
3118@item Language
3119@c The purpose of this field is to make it possible to automatically
3120@c - convert PO files to translation memory,
3121@c - initialize a spell checker based on the PO file,
3122@c - perform language specific checks.
3123Fill in the language code of the language.  This can be in one of three
3124forms:
3125
3126@itemize -
3127@item
3128@samp{@var{ll}}, an @w{ISO 639} two-letter language code (lowercase).
3129See @ref{Language Codes} for the list of codes.
3130
3131@item
3132@samp{@var{ll}_@var{CC}}, where @samp{@var{ll}} is an @w{ISO 639} two-letter
3133language code (lowercase) and @samp{@var{CC}} is an @w{ISO 3166} two-letter
3134country code (uppercase).  The country code specification is not redundant:
3135Some languages have dialects in different countries.  For example,
3136@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil.  The country
3137code serves to distinguish the dialects. See @ref{Language Codes} and
3138@ref{Country Codes} for the lists of codes.
3139
3140@item
3141@samp{@var{ll}_@var{CC}@@@var{variant}}, where @samp{@var{ll}} is an
3142@w{ISO 639} two-letter language code (lowercase), @samp{@var{CC}} is an
3143@w{ISO 3166} two-letter country code (uppercase), and @samp{@var{variant}} is
3144a variant designator. The variant designator (lowercase) can be a script
3145designator, such as @samp{latin} or @samp{cyrillic}.
3146@end itemize
3147
3148The naming convention @samp{@var{ll}_@var{CC}} is also the way locales are
3149named on systems based on GNU libc.  But there are three important differences:
3150
3151@itemize @bullet
3152@item
3153In this PO file field, but not in locale names, @samp{@var{ll}_@var{CC}}
3154combinations denoting a language's main dialect are abbreviated as
3155@samp{@var{ll}}.  For example, @samp{de} is equivalent to @samp{de_DE}
3156(German as spoken in Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as
3157spoken in Portugal) in this context.
3158
3159@item
3160In this PO file field, suffixes like @samp{.@var{encoding}} are not used.
3161
3162@item
3163In this PO file field, variant designators that are not relevant to message
3164translation, such as @samp{@@euro}, are not used.
3165@end itemize
3166
3167So, if your locale name is @samp{de_DE.UTF-8}, the language specification in
3168PO files is just @samp{de}.
3169
3170@item Content-Type
3171@cindex encoding of PO files
3172@cindex charset of PO files
3173Replace @samp{CHARSET} with the character encoding used for your language,
3174in your locale, or UTF-8.  This field is needed for correct operation of the
3175@code{msgmerge} and @code{msgfmt} programs, as well as for users whose
3176locale's character encoding differs from yours (see @ref{Charset conversion}).
3177
3178@cindex @code{locale} program
3179You get the character encoding of your locale by running the shell command
3180@samp{locale charmap}.  If the result is @samp{C} or @samp{ANSI_X3.4-1968},
3181which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
3182locale is not correctly configured.  In this case, ask your translation
3183team which charset to use.  @samp{ASCII} is not usable for any language
3184except Latin.
3185
3186@cindex encoding list
3187Because the PO files must be portable to operating systems with less advanced
3188internationalization facilities, the character encodings that can be used
3189are limited to those supported by both GNU @code{libc} and GNU
3190@code{libiconv}.  These are:
3191@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
3192@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
3193@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
3194@code{ISO-8859-15},
3195@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
3196@code{CP850}, @code{CP866}, @code{CP874},
3197@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
3198@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
3199@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
3200@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
3201@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
3202
3203@c This data is taken from glibc/localedata/SUPPORTED.
3204@cindex Linux
3205In the GNU system, the following encodings are frequently used for the
3206corresponding languages.
3207
3208@cindex encoding for your language
3209@itemize
3210@item @code{ISO-8859-1} for
3211Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
3212English, Estonian, Faroese, Finnish, French, Galician, German,
3213Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
3214Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
3215Walloon,
3216@item @code{ISO-8859-2} for
3217Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
3218Slovenian,
3219@item @code{ISO-8859-3} for Maltese,
3220@item @code{ISO-8859-5} for Macedonian, Serbian,
3221@item @code{ISO-8859-6} for Arabic,
3222@item @code{ISO-8859-7} for Greek,
3223@item @code{ISO-8859-8} for Hebrew,
3224@item @code{ISO-8859-9} for Turkish,
3225@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
3226@item @code{ISO-8859-14} for Welsh,
3227@item @code{ISO-8859-15} for
3228Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
3229Italian, Portuguese, Spanish, Swedish, Walloon,
3230@item @code{KOI8-R} for Russian,
3231@item @code{KOI8-U} for Ukrainian,
3232@item @code{KOI8-T} for Tajik,
3233@item @code{CP1251} for Bulgarian, Belarusian,
3234@item @code{GB2312}, @code{GBK}, @code{GB18030}
3235for simplified writing of Chinese,
3236@item @code{BIG5}, @code{BIG5-HKSCS}
3237for traditional writing of Chinese,
3238@item @code{EUC-JP} for Japanese,
3239@item @code{EUC-KR} for Korean,
3240@item @code{TIS-620} for Thai,
3241@item @code{GEORGIAN-PS} for Georgian,
3242@item @code{UTF-8} for any language, including those listed above.
3243@end itemize
3244
3245@cindex quote characters, use in PO files
3246@cindex quotation marks
3247When single quote characters or double quote characters are used in
3248translations for your language, and your locale's encoding is one of the
3249ISO-8859-* charsets, it is best if you create your PO files in UTF-8
3250encoding, instead of your locale's encoding.  This is because in UTF-8
3251the real quote characters can be represented (single quote characters:
3252U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
3253ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
3254real quote characters, whereas users in ISO-8859-* locales will see the
3255vertical apostrophe and the vertical double quote instead (because that's
3256what the character set conversion will transliterate them to).
3257
3258@cindex @code{xmodmap} program, and typing quotation marks
3259To enter such quote characters under X11, you can change your keyboard
3260mapping using the @code{xmodmap} program.  The X11 names of the quote
3261characters are "leftsinglequotemark", "rightsinglequotemark",
3262"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
3263"doublelowquotemark".
3264
3265Note that only recent versions of GNU Emacs support the UTF-8 encoding:
3266Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
3267support the UTF-8 encoding.
3268
3269The character encoding name can be written in either upper or lower case.
3270Usually upper case is preferred.
3271
3272@item Content-Transfer-Encoding
3273Set this to @code{8bit}.
3274
3275@item Plural-Forms
3276This field is optional.  It is only needed if the PO file has plural forms.
3277You can find them by searching for the @samp{msgid_plural} keyword.  The
3278format of the plural forms field is described in @ref{Plural forms} and
3279@ref{Translating plural forms}.
3280@end table
3281
3282@node Updating
3283@chapter Updating Existing PO Files
3284
3285@menu
3286* msgmerge Invocation::         Invoking the @code{msgmerge} Program
3287@end menu
3288
3289@node msgmerge Invocation
3290@section Invoking the @code{msgmerge} Program
3291
3292@include msgmerge.texi
3293
3294@node Editing
3295@chapter Editing PO Files
3296@cindex Editing PO Files
3297
3298@menu
3299* KBabel::                      KDE's PO File Editor
3300* Gtranslator::                 GNOME's PO File Editor
3301* PO Mode::                     Emacs's PO File Editor
3302* Compendium::                  Using Translation Compendia
3303@end menu
3304
3305@node KBabel
3306@section KDE's PO File Editor
3307@cindex KDE PO file editor
3308
3309@node Gtranslator
3310@section GNOME's PO File Editor
3311@cindex GNOME PO file editor
3312
3313@node PO Mode
3314@section Emacs's PO File Editor
3315@cindex Emacs PO Mode
3316
3317@c FIXME: Rewrite.
3318
3319For those of you being
3320the lucky users of Emacs, PO mode has been specifically created
3321for providing a cozy environment for editing or modifying PO files.
3322While editing a PO file, PO mode allows for the easy browsing of
3323auxiliary and compendium PO files, as well as for following references into
3324the set of C program sources from which PO files have been derived.
3325It has a few special features, among which are the interactive marking
3326of program strings as translatable, and the validation of PO files
3327with easy repositioning to PO file lines showing errors.
3328
3329For the beginning, besides main PO mode commands
3330(@pxref{Main PO Commands}), you should know how to move between entries
3331(@pxref{Entry Positioning}), and how to handle untranslated entries
3332(@pxref{Untranslated Entries}).
3333
3334@menu
3335* Installation::                Completing GNU @code{gettext} Installation
3336* Main PO Commands::            Main Commands
3337* Entry Positioning::           Entry Positioning
3338* Normalizing::                 Normalizing Strings in Entries
3339* Translated Entries::          Translated Entries
3340* Fuzzy Entries::               Fuzzy Entries
3341* Untranslated Entries::        Untranslated Entries
3342* Obsolete Entries::            Obsolete Entries
3343* Modifying Translations::      Modifying Translations
3344* Modifying Comments::          Modifying Comments
3345* Subedit::                     Mode for Editing Translations
3346* C Sources Context::           C Sources Context
3347* Auxiliary::                   Consulting Auxiliary PO Files
3348@end menu
3349
3350@node Installation
3351@subsection Completing GNU @code{gettext} Installation
3352
3353@cindex installing @code{gettext}
3354@cindex @code{gettext} installation
3355Once you have received, unpacked, configured and compiled the GNU
3356@code{gettext} distribution, the @samp{make install} command puts in
3357place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
3358@code{msgmerge}, as well as their available message catalogs.  To
3359top off a comfortable installation, you might also want to make the
3360PO mode available to your Emacs users.
3361
3362@emindex @file{.emacs} customizations
3363@emindex installing PO mode
3364During the installation of the PO mode, you might want to modify your
3365file @file{.emacs}, once and for all, so it contains a few lines looking
3366like:
3367
3368@example
3369(setq auto-mode-alist
3370      (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
3371(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
3372@end example
3373
3374Later, whenever you edit some @file{.po}
3375file, or any file having the string @samp{.po.} within its name,
3376Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
3377automatically activates PO mode commands for the associated buffer.
3378The string @emph{PO} appears in the mode line for any buffer for
3379which PO mode is active.  Many PO files may be active at once in a
3380single Emacs session.
3381
3382If you are using Emacs version 20 or newer, and have already installed
3383the appropriate international fonts on your system, you may also tell
3384Emacs how to determine automatically the coding system of every PO file.
3385This will often (but not always) cause the necessary fonts to be loaded
3386and used for displaying the translations on your Emacs screen.  For this
3387to happen, add the lines:
3388
3389@example
3390(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
3391                            'po-find-file-coding-system)
3392(autoload 'po-find-file-coding-system "po-mode")
3393@end example
3394
3395@noindent
3396to your @file{.emacs} file.  If, with this, you still see boxes instead
3397of international characters, try a different font set (via Shift Mouse
3398button 1).
3399
3400@node Main PO Commands
3401@subsection Main PO mode Commands
3402
3403@cindex PO mode (Emacs) commands
3404@emindex commands
3405After setting up Emacs with something similar to the lines in
3406@ref{Installation}, PO mode is activated for a window when Emacs finds a
3407PO file in that window.  This puts the window read-only and establishes a
3408po-mode-map, which is a genuine Emacs mode, in a way that is not derived
3409from text mode in any way.  Functions found on @code{po-mode-hook},
3410if any, will be executed.
3411
3412When PO mode is active in a window, the letters @samp{PO} appear
3413in the mode line for that window.  The mode line also displays how
3414many entries of each kind are held in the PO file.  For example,
3415the string @samp{132t+3f+10u+2o} would tell the translator that the
3416PO mode contains 132 translated entries (@pxref{Translated Entries},
34173 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
3418(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
3419Entries}).  Zero-coefficients items are not shown.  So, in this example, if
3420the fuzzy entries were unfuzzied, the untranslated entries were translated
3421and the obsolete entries were deleted, the mode line would merely display
3422@samp{145t} for the counters.
3423
3424The main PO commands are those which do not fit into the other categories of
3425subsequent sections.  These allow for quitting PO mode or for managing windows
3426in special ways.
3427
3428@table @kbd
3429@item _
3430@efindex _@r{, PO Mode command}
3431Undo last modification to the PO file (@code{po-undo}).
3432
3433@item Q
3434@efindex Q@r{, PO Mode command}
3435Quit processing and save the PO file (@code{po-quit}).
3436
3437@item q
3438@efindex q@r{, PO Mode command}
3439Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
3440
3441@item 0
3442@efindex 0@r{, PO Mode command}
3443Temporary leave the PO file window (@code{po-other-window}).
3444
3445@item ?
3446@itemx h
3447@efindex ?@r{, PO Mode command}
3448@efindex h@r{, PO Mode command}
3449Show help about PO mode (@code{po-help}).
3450
3451@item =
3452@efindex =@r{, PO Mode command}
3453Give some PO file statistics (@code{po-statistics}).
3454
3455@item V
3456@efindex V@r{, PO Mode command}
3457Batch validate the format of the whole PO file (@code{po-validate}).
3458
3459@end table
3460
3461@efindex _@r{, PO Mode command}
3462@efindex po-undo@r{, PO Mode command}
3463The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
3464@emph{undo} facility.  @xref{Undo, , Undoing Changes, emacs, The Emacs
3465Editor}.  Each time @kbd{_} is typed, modifications which the translator
3466did to the PO file are undone a little more.  For the purpose of
3467undoing, each PO mode command is atomic.  This is especially true for
3468the @kbd{@key{RET}} command: the whole edition made by using a single
3469use of this command is undone at once, even if the edition itself
3470implied several actions.  However, while in the editing window, one
3471can undo the edition work quite parsimoniously.
3472
3473@efindex Q@r{, PO Mode command}
3474@efindex q@r{, PO Mode command}
3475@efindex po-quit@r{, PO Mode command}
3476@efindex po-confirm-and-quit@r{, PO Mode command}
3477The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
3478(@code{po-confirm-and-quit}) are used when the translator is done with the
3479PO file.  The former is a bit less verbose than the latter.  If the file
3480has been modified, it is saved to disk first.  In both cases, and prior to
3481all this, the commands check if any untranslated messages remain in the
3482PO file and, if so, the translator is asked if she really wants to leave
3483off working with this PO file.  This is the preferred way of getting rid
3484of an Emacs PO file buffer.  Merely killing it through the usual command
3485@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
3486
3487@efindex 0@r{, PO Mode command}
3488@efindex po-other-window@r{, PO Mode command}
3489The command @kbd{0} (@code{po-other-window}) is another, softer way,
3490to leave PO mode, temporarily.  It just moves the cursor to some other
3491Emacs window, and pops one if necessary.  For example, if the translator
3492just got PO mode to show some source context in some other, she might
3493discover some apparent bug in the program source that needs correction.
3494This command allows the translator to change sex, become a programmer,
3495and have the cursor right into the window containing the program she
3496(or rather @emph{he}) wants to modify.  By later getting the cursor back
3497in the PO file window, or by asking Emacs to edit this file once again,
3498PO mode is then recovered.
3499
3500@efindex ?@r{, PO Mode command}
3501@efindex h@r{, PO Mode command}
3502@efindex po-help@r{, PO Mode command}
3503The command @kbd{h} (@code{po-help}) displays a summary of all available PO
3504mode commands.  The translator should then type any character to resume
3505normal PO mode operations.  The command @kbd{?} has the same effect
3506as @kbd{h}.
3507
3508@efindex =@r{, PO Mode command}
3509@efindex po-statistics@r{, PO Mode command}
3510The command @kbd{=} (@code{po-statistics}) computes the total number of
3511entries in the PO file, the ordinal of the current entry (counted from
35121), the number of untranslated entries, the number of obsolete entries,
3513and displays all these numbers.
3514
3515@efindex V@r{, PO Mode command}
3516@efindex po-validate@r{, PO Mode command}
3517The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
3518checking and verbose
3519mode over the current PO file.  This command first offers to save the
3520current PO file on disk.  The @code{msgfmt} tool, from GNU @code{gettext},
3521has the purpose of creating a MO file out of a PO file, and PO mode uses
3522the features of this program for checking the overall format of a PO file,
3523as well as all individual entries.
3524
3525@efindex next-error@r{, stepping through PO file validation results}
3526The program @code{msgfmt} runs asynchronously with Emacs, so the
3527translator regains control immediately while her PO file is being studied.
3528Error output is collected in the Emacs @samp{*compilation*} buffer,
3529displayed in another window.  The regular Emacs command @kbd{C-x`}
3530(@code{next-error}), as well as other usual compile commands, allow the
3531translator to reposition quickly to the offending parts of the PO file.
3532Once the cursor is on the line in error, the translator may decide on
3533any PO mode action which would help correcting the error.
3534
3535@node Entry Positioning
3536@subsection Entry Positioning
3537
3538@emindex current entry of a PO file
3539The cursor in a PO file window is almost always part of
3540an entry.  The only exceptions are the special case when the cursor
3541is after the last entry in the file, or when the PO file is
3542empty.  The entry where the cursor is found to be is said to be the
3543current entry.  Many PO mode commands operate on the current entry,
3544so moving the cursor does more than allowing the translator to browse
3545the PO file, this also selects on which entry commands operate.
3546
3547@emindex moving through a PO file
3548Some PO mode commands alter the position of the cursor in a specialized
3549way.  A few of those special purpose positioning are described here,
3550the others are described in following sections (for a complete list try
3551@kbd{C-h m}):
3552
3553@table @kbd
3554
3555@item .
3556@efindex .@r{, PO Mode command}
3557Redisplay the current entry (@code{po-current-entry}).
3558
3559@item n
3560@efindex n@r{, PO Mode command}
3561Select the entry after the current one (@code{po-next-entry}).
3562
3563@item p
3564@efindex p@r{, PO Mode command}
3565Select the entry before the current one (@code{po-previous-entry}).
3566
3567@item <
3568@efindex <@r{, PO Mode command}
3569Select the first entry in the PO file (@code{po-first-entry}).
3570
3571@item >
3572@efindex >@r{, PO Mode command}
3573Select the last entry in the PO file (@code{po-last-entry}).
3574
3575@item m
3576@efindex m@r{, PO Mode command}
3577Record the location of the current entry for later use
3578(@code{po-push-location}).
3579
3580@item r
3581@efindex r@r{, PO Mode command}
3582Return to a previously saved entry location (@code{po-pop-location}).
3583
3584@item x
3585@efindex x@r{, PO Mode command}
3586Exchange the current entry location with the previously saved one
3587(@code{po-exchange-location}).
3588
3589@end table
3590
3591@efindex .@r{, PO Mode command}
3592@efindex po-current-entry@r{, PO Mode command}
3593Any Emacs command able to reposition the cursor may be used
3594to select the current entry in PO mode, including commands which
3595move by characters, lines, paragraphs, screens or pages, and search
3596commands.  However, there is a kind of standard way to display the
3597current entry in PO mode, which usual Emacs commands moving
3598the cursor do not especially try to enforce.  The command @kbd{.}
3599(@code{po-current-entry}) has the sole purpose of redisplaying the
3600current entry properly, after the current entry has been changed by
3601means external to PO mode, or the Emacs screen otherwise altered.
3602
3603It is yet to be decided if PO mode helps the translator, or otherwise
3604irritates her, by forcing a rigid window disposition while she
3605is doing her work.  We originally had quite precise ideas about
3606how windows should behave, but on the other hand, anyone used to
3607Emacs is often happy to keep full control.  Maybe a fixed window
3608disposition might be offered as a PO mode option that the translator
3609might activate or deactivate at will, so it could be offered on an
3610experimental basis.  If nobody feels a real need for using it, or
3611a compulsion for writing it, we should drop this whole idea.
3612The incentive for doing it should come from translators rather than
3613programmers, as opinions from an experienced translator are surely
3614more worth to me than opinions from programmers @emph{thinking} about
3615how @emph{others} should do translation.
3616
3617@efindex n@r{, PO Mode command}
3618@efindex po-next-entry@r{, PO Mode command}
3619@efindex p@r{, PO Mode command}
3620@efindex po-previous-entry@r{, PO Mode command}
3621The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
3622(@code{po-previous-entry}) move the cursor the entry following,
3623or preceding, the current one.  If @kbd{n} is given while the
3624cursor is on the last entry of the PO file, or if @kbd{p}
3625is given while the cursor is on the first entry, no move is done.
3626
3627@efindex <@r{, PO Mode command}
3628@efindex po-first-entry@r{, PO Mode command}
3629@efindex >@r{, PO Mode command}
3630@efindex po-last-entry@r{, PO Mode command}
3631The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
3632(@code{po-last-entry}) move the cursor to the first entry, or last
3633entry, of the PO file.  When the cursor is located past the last
3634entry in a PO file, most PO mode commands will return an error saying
3635@samp{After last entry}.  Moreover, the commands @kbd{<} and @kbd{>}
3636have the special property of being able to work even when the cursor
3637is not into some PO file entry, and one may use them for nicely
3638correcting this situation.  But even these commands will fail on a
3639truly empty PO file.  There are development plans for the PO mode for it
3640to interactively fill an empty PO file from sources.  @xref{Marking}.
3641
3642The translator may decide, before working at the translation of
3643a particular entry, that she needs to browse the remainder of the
3644PO file, maybe for finding the terminology or phraseology used
3645in related entries.  She can of course use the standard Emacs idioms
3646for saving the current cursor location in some register, and use that
3647register for getting back, or else, use the location ring.
3648
3649@efindex m@r{, PO Mode command}
3650@efindex po-push-location@r{, PO Mode command}
3651@efindex r@r{, PO Mode command}
3652@efindex po-pop-location@r{, PO Mode command}
3653PO mode offers another approach, by which cursor locations may be saved
3654onto a special stack.  The command @kbd{m} (@code{po-push-location})
3655merely adds the location of current entry to the stack, pushing
3656the already saved locations under the new one.  The command
3657@kbd{r} (@code{po-pop-location}) consumes the top stack element and
3658repositions the cursor to the entry associated with that top element.
3659This position is then lost, for the next @kbd{r} will move the cursor
3660to the previously saved location, and so on until no locations remain
3661on the stack.
3662
3663If the translator wants the position to be kept on the location stack,
3664maybe for taking a look at the entry associated with the top
3665element, then go elsewhere with the intent of getting back later, she
3666ought to use @kbd{m} immediately after @kbd{r}.
3667
3668@efindex x@r{, PO Mode command}
3669@efindex po-exchange-location@r{, PO Mode command}
3670The command @kbd{x} (@code{po-exchange-location}) simultaneously
3671repositions the cursor to the entry associated with the top element of
3672the stack of saved locations, and replaces that top element with the
3673location of the current entry before the move.  Consequently, repeating
3674the @kbd{x} command toggles alternatively between two entries.
3675For achieving this, the translator will position the cursor on the
3676first entry, use @kbd{m}, then position to the second entry, and
3677merely use @kbd{x} for making the switch.
3678
3679@node Normalizing
3680@subsection Normalizing Strings in Entries
3681@cindex string normalization in entries
3682
3683There are many different ways for encoding a particular string into a
3684PO file entry, because there are so many different ways to split and
3685quote multi-line strings, and even, to represent special characters
3686by backslashed escaped sequences.  Some features of PO mode rely on
3687the ability for PO mode to scan an already existing PO file for a
3688particular string encoded into the @code{msgid} field of some entry.
3689Even if PO mode has internally all the built-in machinery for
3690implementing this recognition easily, doing it fast is technically
3691difficult.  To facilitate a solution to this efficiency problem,
3692we decided on a canonical representation for strings.
3693
3694A conventional representation of strings in a PO file is currently
3695under discussion, and PO mode experiments with a canonical representation.
3696Having both @code{xgettext} and PO mode converging towards a uniform
3697way of representing equivalent strings would be useful, as the internal
3698normalization needed by PO mode could be automatically satisfied
3699when using @code{xgettext} from GNU @code{gettext}.  An explicit
3700PO mode normalization should then be only necessary for PO files
3701imported from elsewhere, or for when the convention itself evolves.
3702
3703So, for achieving normalization of at least the strings of a given
3704PO file needing a canonical representation, the following PO mode
3705command is available:
3706
3707@emindex string normalization in entries
3708@table @kbd
3709@item M-x po-normalize
3710@efindex po-normalize@r{, PO Mode command}
3711Tidy the whole PO file by making entries more uniform.
3712
3713@end table
3714
3715The special command @kbd{M-x po-normalize}, which has no associated
3716keys, revises all entries, ensuring that strings of both original
3717and translated entries use uniform internal quoting in the PO file.
3718It also removes any crumb after the last entry.  This command may be
3719useful for PO files freshly imported from elsewhere, or if we ever
3720improve on the canonical quoting format we use.  This canonical format
3721is not only meant for getting cleaner PO files, but also for greatly
3722speeding up @code{msgid} string lookup for some other PO mode commands.
3723
3724@kbd{M-x po-normalize} presently makes three passes over the entries.
3725The first implements heuristics for converting PO files for GNU
3726@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
3727fields were using K&R style C string syntax for multi-line strings.
3728These heuristics may fail for comments not related to obsolete
3729entries and ending with a backslash; they also depend on subsequent
3730passes for finalizing the proper commenting of continued lines for
3731obsolete entries.  This first pass might disappear once all oldish PO
3732files would have been adjusted.  The second and third pass normalize
3733all @code{msgid} and @code{msgstr} strings respectively.  They also
3734clean out those trailing backslashes used by XView's @code{msgfmt}
3735for continued lines.
3736
3737@cindex importing PO files
3738Having such an explicit normalizing command allows for importing PO
3739files from other sources, but also eases the evolution of the current
3740convention, evolution driven mostly by aesthetic concerns, as of now.
3741It is easy to make suggested adjustments at a later time, as the
3742normalizing command and eventually, other GNU @code{gettext} tools
3743should greatly automate conformance.  A description of the canonical
3744string format is given below, for the particular benefit of those not
3745having Emacs handy, and who would nevertheless want to handcraft
3746their PO files in nice ways.
3747
3748@cindex multi-line strings
3749Right now, in PO mode, strings are single line or multi-line.  A string
3750goes multi-line if and only if it has @emph{embedded} newlines, that
3751is, if it matches @samp{[^\n]\n+[^\n]}.  So, we would have:
3752
3753@example
3754msgstr "\n\nHello, world!\n\n\n"
3755@end example
3756
3757but, replacing the space by a newline, this becomes:
3758
3759@example
3760msgstr ""
3761"\n"
3762"\n"
3763"Hello,\n"
3764"world!\n"
3765"\n"
3766"\n"
3767@end example
3768
3769We are deliberately using a caricatural example, here, to make the
3770point clearer.  Usually, multi-lines are not that bad looking.
3771It is probable that we will implement the following suggestion.
3772We might lump together all initial newlines into the empty string,
3773and also all newlines introducing empty lines (that is, for @w{@var{n}
3774> 1}, the @var{n}-1'th last newlines would go together on a separate
3775string), so making the previous example appear:
3776
3777@example
3778msgstr "\n\n"
3779"Hello,\n"
3780"world!\n"
3781"\n\n"
3782@end example
3783
3784There are a few yet undecided little points about string normalization,
3785to be documented in this manual, once these questions settle.
3786
3787@node Translated Entries
3788@subsection Translated Entries
3789@cindex translated entries
3790
3791Each PO file entry for which the @code{msgstr} field has been filled with
3792a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
3793is said to be a @dfn{translated} entry.  Only translated entries will
3794later be compiled by GNU @code{msgfmt} and become usable in programs.
3795Other entry types will be excluded; translation will not occur for them.
3796
3797@emindex moving by translated entries
3798Some commands are more specifically related to translated entry processing.
3799
3800@table @kbd
3801@item t
3802@efindex t@r{, PO Mode command}
3803Find the next translated entry (@code{po-next-translated-entry}).
3804
3805@item T
3806@efindex T@r{, PO Mode command}
3807Find the previous translated entry (@code{po-previous-translated-entry}).
3808
3809@end table
3810
3811@efindex t@r{, PO Mode command}
3812@efindex po-next-translated-entry@r{, PO Mode command}
3813@efindex T@r{, PO Mode command}
3814@efindex po-previous-translated-entry@r{, PO Mode command}
3815The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3816(@code{po-previous-translated-entry}) move forwards or backwards, chasing
3817for an translated entry.  If none is found, the search is extended and
3818wraps around in the PO file buffer.
3819
3820@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3821Translated entries usually result from the translator having edited in
3822a translation for them, @ref{Modifying Translations}.  However, if the
3823variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3824received a new translation first becomes a fuzzy entry, which ought to
3825be later unfuzzied before becoming an official, genuine translated entry.
3826@xref{Fuzzy Entries}.
3827
3828@node Fuzzy Entries
3829@subsection Fuzzy Entries
3830@cindex fuzzy entries
3831
3832@cindex attributes of a PO file entry
3833@cindex attribute, fuzzy
3834Each PO file entry may have a set of @dfn{attributes}, which are
3835qualities given a name and explicitly associated with the translation,
3836using a special system comment.  One of these attributes
3837has the name @code{fuzzy}, and entries having this attribute are said
3838to have a fuzzy translation.  They are called fuzzy entries, for short.
3839
3840Fuzzy entries, even if they account for translated entries for
3841most other purposes, usually call for revision by the translator.
3842Those may be produced by applying the program @code{msgmerge} to
3843update an older translated PO files according to a new PO template
3844file, when this tool hypothesises that some new @code{msgid} has
3845been modified only slightly out of an older one, and chooses to pair
3846what it thinks to be the old translation for the new modified entry.
3847The slight alteration in the original string (the @code{msgid} string)
3848should often be reflected in the translated string, and this requires
3849the intervention of the translator.  For this reason, @code{msgmerge}
3850might mark some entries as being fuzzy.
3851
3852@emindex moving by fuzzy entries
3853Also, the translator may decide herself to mark an entry as fuzzy
3854for her own convenience, when she wants to remember that the entry
3855has to be later revisited.  So, some commands are more specifically
3856related to fuzzy entry processing.
3857
3858@table @kbd
3859@item f
3860@efindex f@r{, PO Mode command}
3861@c better append "-entry" all the time. -ke-
3862Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3863
3864@item F
3865@efindex F@r{, PO Mode command}
3866Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3867
3868@item @key{TAB}
3869@efindex TAB@r{, PO Mode command}
3870Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3871
3872@end table
3873
3874@efindex f@r{, PO Mode command}
3875@efindex po-next-fuzzy-entry@r{, PO Mode command}
3876@efindex F@r{, PO Mode command}
3877@efindex po-previous-fuzzy-entry@r{, PO Mode command}
3878The commands @kbd{f} (@code{po-next-fuzzy-entry}) and @kbd{F}
3879(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3880a fuzzy entry.  If none is found, the search is extended and wraps
3881around in the PO file buffer.
3882
3883@efindex TAB@r{, PO Mode command}
3884@efindex po-unfuzzy@r{, PO Mode command}
3885@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3886The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3887attribute associated with an entry, usually leaving it translated.
3888Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3889the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3890for another interesting entry to work on.  The initial value of
3891@code{po-auto-select-on-unfuzzy} is @code{nil}.
3892
3893The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}.  However,
3894if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3895edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3896ensure some kind of double check, later.  In this case, the usual paradigm
3897is that an entry becomes fuzzy (if not already) whenever the translator
3898modifies it.  If she is satisfied with the translation, she then uses
3899@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3900on the same blow.  If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3901to chase another entry, leaving the entry fuzzy.
3902
3903@efindex DEL@r{, PO Mode command}
3904@efindex po-fade-out-entry@r{, PO Mode command}
3905The translator may also use the @kbd{@key{DEL}} command
3906(@code{po-fade-out-entry}) over any translated entry to mark it as being
3907fuzzy, when she wants to easily leave a trace she wants to later return
3908working at this entry.
3909
3910Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3911command, the translator is asked for confirmation, if fuzzy string
3912still exists.
3913
3914@node Untranslated Entries
3915@subsection Untranslated Entries
3916@cindex untranslated entries
3917
3918When @code{xgettext} originally creates a PO file, unless told
3919otherwise, it initializes the @code{msgid} field with the untranslated
3920string, and leaves the @code{msgstr} string to be empty.  Such entries,
3921having an empty translation, are said to be @dfn{untranslated} entries.
3922Later, when the programmer slightly modifies some string right in
3923the program, this change is later reflected in the PO file
3924by the appearance of a new untranslated entry for the modified string.
3925
3926The usual commands moving from entry to entry consider untranslated
3927entries on the same level as active entries.  Untranslated entries
3928are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3929
3930@emindex moving by untranslated entries
3931The work of the translator might be (quite naively) seen as the process
3932of seeking for an untranslated entry, editing a translation for
3933it, and repeating these actions until no untranslated entries remain.
3934Some commands are more specifically related to untranslated entry
3935processing.
3936
3937@table @kbd
3938@item u
3939@efindex u@r{, PO Mode command}
3940Find the next untranslated entry (@code{po-next-untranslated-entry}).
3941
3942@item U
3943@efindex U@r{, PO Mode command}
3944Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3945
3946@item k
3947@efindex k@r{, PO Mode command}
3948Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3949
3950@end table
3951
3952@efindex u@r{, PO Mode command}
3953@efindex po-next-untranslated-entry@r{, PO Mode command}
3954@efindex U@r{, PO Mode command}
3955@efindex po-previous-untransted-entry@r{, PO Mode command}
3956The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3957(@code{po-previous-untransted-entry}) move forwards or backwards,
3958chasing for an untranslated entry.  If none is found, the search is
3959extended and wraps around in the PO file buffer.
3960
3961@efindex k@r{, PO Mode command}
3962@efindex po-kill-msgstr@r{, PO Mode command}
3963An entry can be turned back into an untranslated entry by
3964merely emptying its translation, using the command @kbd{k}
3965(@code{po-kill-msgstr}).  @xref{Modifying Translations}.
3966
3967Also, when time comes to quit working on a PO file buffer
3968with the @kbd{q} command, the translator is asked for confirmation,
3969if some untranslated string still exists.
3970
3971@node Obsolete Entries
3972@subsection Obsolete Entries
3973@cindex obsolete entries
3974
3975By @dfn{obsolete} PO file entries, we mean those entries which are
3976commented out, usually by @code{msgmerge} when it found that the
3977translation is not needed anymore by the package being localized.
3978
3979The usual commands moving from entry to entry consider obsolete
3980entries on the same level as active entries.  Obsolete entries are
3981easily recognizable by the fact that all their lines start with
3982@code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3983
3984Commands exist for emptying the translation or reinitializing it
3985to the original untranslated string.  Commands interfacing with the
3986kill ring may force some previously saved text into the translation.
3987The user may interactively edit the translation.  All these commands
3988may apply to obsolete entries, carefully leaving the entry obsolete
3989after the fact.
3990
3991@emindex moving by obsolete entries
3992Moreover, some commands are more specifically related to obsolete
3993entry processing.
3994
3995@table @kbd
3996@item o
3997@efindex o@r{, PO Mode command}
3998Find the next obsolete entry (@code{po-next-obsolete-entry}).
3999
4000@item O
4001@efindex O@r{, PO Mode command}
4002Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
4003
4004@item @key{DEL}
4005@efindex DEL@r{, PO Mode command}
4006Make an active entry obsolete, or zap out an obsolete entry
4007(@code{po-fade-out-entry}).
4008
4009@end table
4010
4011@efindex o@r{, PO Mode command}
4012@efindex po-next-obsolete-entry@r{, PO Mode command}
4013@efindex O@r{, PO Mode command}
4014@efindex po-previous-obsolete-entry@r{, PO Mode command}
4015The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
4016(@code{po-previous-obsolete-entry}) move forwards or backwards,
4017chasing for an obsolete entry.  If none is found, the search is
4018extended and wraps around in the PO file buffer.
4019
4020PO mode does not provide ways for un-commenting an obsolete entry
4021and making it active, because this would reintroduce an original
4022untranslated string which does not correspond to any marked string
4023in the program sources.  This goes with the philosophy of never
4024introducing useless @code{msgid} values.
4025
4026@efindex DEL@r{, PO Mode command}
4027@efindex po-fade-out-entry@r{, PO Mode command}
4028@emindex obsolete active entry
4029@emindex comment out PO file entry
4030However, it is possible to comment out an active entry, so making
4031it obsolete.  GNU @code{gettext} utilities will later react to the
4032disappearance of a translation by using the untranslated string.
4033The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
4034a little further towards annihilation.  If the entry is active (it is a
4035translated entry), then it is first made fuzzy.  If it is already fuzzy,
4036then the entry is merely commented out, with confirmation.  If the entry
4037is already obsolete, then it is completely deleted from the PO file.
4038It is easy to recycle the translation so deleted into some other PO file
4039entry, usually one which is untranslated.  @xref{Modifying Translations}.
4040
4041Here is a quite interesting problem to solve for later development of
4042PO mode, for those nights you are not sleepy.  The idea would be that
4043PO mode might become bright enough, one of these days, to make good
4044guesses at retrieving the most probable candidate, among all obsolete
4045entries, for initializing the translation of a newly appeared string.
4046I think it might be a quite hard problem to do this algorithmically, as
4047we have to develop good and efficient measures of string similarity.
4048Right now, PO mode completely lets the decision to the translator,
4049when the time comes to find the adequate obsolete translation, it
4050merely tries to provide handy tools for helping her to do so.
4051
4052@node Modifying Translations
4053@subsection Modifying Translations
4054@cindex editing translations
4055@emindex editing translations
4056
4057PO mode prevents direct modification of the PO file, by the usual
4058means Emacs gives for altering a buffer's contents.  By doing so,
4059it pretends helping the translator to avoid little clerical errors
4060about the overall file format, or the proper quoting of strings,
4061as those errors would be easily made.  Other kinds of errors are
4062still possible, but some may be caught and diagnosed by the batch
4063validation process, which the translator may always trigger by the
4064@kbd{V} command.  For all other errors, the translator has to rely on
4065her own judgment, and also on the linguistic reports submitted to her
4066by the users of the translated package, having the same mother tongue.
4067
4068When the time comes to create a translation, correct an error diagnosed
4069mechanically or reported by a user, the translators have to resort to
4070using the following commands for modifying the translations.
4071
4072@table @kbd
4073@item @key{RET}
4074@efindex RET@r{, PO Mode command}
4075Interactively edit the translation (@code{po-edit-msgstr}).
4076
4077@item @key{LFD}
4078@itemx C-j
4079@efindex LFD@r{, PO Mode command}
4080@efindex C-j@r{, PO Mode command}
4081Reinitialize the translation with the original, untranslated string
4082(@code{po-msgid-to-msgstr}).
4083
4084@item k
4085@efindex k@r{, PO Mode command}
4086Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
4087
4088@item w
4089@efindex w@r{, PO Mode command}
4090Save the translation on the kill ring, without deleting it
4091(@code{po-kill-ring-save-msgstr}).
4092
4093@item y
4094@efindex y@r{, PO Mode command}
4095Replace the translation, taking the new from the kill ring
4096(@code{po-yank-msgstr}).
4097
4098@end table
4099
4100@efindex RET@r{, PO Mode command}
4101@efindex po-edit-msgstr@r{, PO Mode command}
4102The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
4103window meant to edit in a new translation, or to modify an already existing
4104translation.  The new window contains a copy of the translation taken from
4105the current PO file entry, all ready for edition, expunged of all quoting
4106marks, fully modifiable and with the complete extent of Emacs modifying
4107commands.  When the translator is done with her modifications, she may use
4108@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
4109results, or @w{@kbd{C-c C-k}} to abort her modifications.  @xref{Subedit},
4110for more information.
4111
4112@efindex LFD@r{, PO Mode command}
4113@efindex C-j@r{, PO Mode command}
4114@efindex po-msgid-to-msgstr@r{, PO Mode command}
4115The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
4116reinitializes the translation with the original string.  This command is
4117normally used when the translator wants to redo a fresh translation of
4118the original string, disregarding any previous work.
4119
4120@evindex po-auto-edit-with-msgid@r{, PO Mode variable}
4121It is possible to arrange so, whenever editing an untranslated
4122entry, the @kbd{@key{LFD}} command be automatically executed.  If you set
4123@code{po-auto-edit-with-msgid} to @code{t}, the translation gets
4124initialised with the original string, in case none exists already.
4125The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
4126
4127@emindex starting a string translation
4128In fact, whether it is best to start a translation with an empty
4129string, or rather with a copy of the original string, is a matter of
4130taste or habit.  Sometimes, the source language and the
4131target language are so different that is simply best to start writing
4132on an empty page.  At other times, the source and target languages
4133are so close that it would be a waste to retype a number of words
4134already being written in the original string.  A translator may also
4135like having the original string right under her eyes, as she will
4136progressively overwrite the original text with the translation, even
4137if this requires some extra editing work to get rid of the original.
4138
4139@emindex cut and paste for translated strings
4140@efindex k@r{, PO Mode command}
4141@efindex po-kill-msgstr@r{, PO Mode command}
4142@efindex w@r{, PO Mode command}
4143@efindex po-kill-ring-save-msgstr@r{, PO Mode command}
4144The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
4145translation string, so turning the entry into an untranslated
4146one.  But while doing so, its previous contents is put apart in
4147a special place, known as the kill ring.  The command @kbd{w}
4148(@code{po-kill-ring-save-msgstr}) has also the effect of taking a
4149copy of the translation onto the kill ring, but it otherwise leaves
4150the entry alone, and does @emph{not} remove the translation from the
4151entry.  Both commands use exactly the Emacs kill ring, which is shared
4152between buffers, and which is well known already to Emacs lovers.
4153
4154The translator may use @kbd{k} or @kbd{w} many times in the course
4155of her work, as the kill ring may hold several saved translations.
4156From the kill ring, strings may later be reinserted in various
4157Emacs buffers.  In particular, the kill ring may be used for moving
4158translation strings between different entries of a single PO file
4159buffer, or if the translator is handling many such buffers at once,
4160even between PO files.
4161
4162To facilitate exchanges with buffers which are not in PO mode, the
4163translation string put on the kill ring by the @kbd{k} command is fully
4164unquoted before being saved: external quotes are removed, multi-line
4165strings are concatenated, and backslash escaped sequences are turned
4166into their corresponding characters.  In the special case of obsolete
4167entries, the translation is also uncommented prior to saving.
4168
4169@efindex y@r{, PO Mode command}
4170@efindex po-yank-msgstr@r{, PO Mode command}
4171The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
4172translation of the current entry by a string taken from the kill ring.
4173Following Emacs terminology, we then say that the replacement
4174string is @dfn{yanked} into the PO file buffer.
4175@xref{Yanking, , , emacs, The Emacs Editor}.
4176The first time @kbd{y} is used, the translation receives the value of
4177the most recent addition to the kill ring.  If @kbd{y} is typed once
4178again, immediately, without intervening keystrokes, the translation
4179just inserted is taken away and replaced by the second most recent
4180addition to the kill ring.  By repeating @kbd{y} many times in a row,
4181the translator may travel along the kill ring for saved strings,
4182until she finds the string she really wanted.
4183
4184When a string is yanked into a PO file entry, it is fully and
4185automatically requoted for complying with the format PO files should
4186have.  Further, if the entry is obsolete, PO mode then appropriately
4187push the inserted string inside comments.  Once again, translators
4188should not burden themselves with quoting considerations besides, of
4189course, the necessity of the translated string itself respective to
4190the program using it.
4191
4192Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
4193on the kill ring, as almost any PO mode command replacing translation
4194strings (or the translator comments) automatically saves the old string
4195on the kill ring.  The main exceptions to this general rule are the
4196yanking commands themselves.
4197
4198@emindex using obsolete translations to make new entries
4199To better illustrate the operation of killing and yanking, let's
4200use an actual example, taken from a common situation.  When the
4201programmer slightly modifies some string right in the program, his
4202change is later reflected in the PO file by the appearance
4203of a new untranslated entry for the modified string, and the fact
4204that the entry translating the original or unmodified string becomes
4205obsolete.  In many cases, the translator might spare herself some work
4206by retrieving the unmodified translation from the obsolete entry,
4207then initializing the untranslated entry @code{msgstr} field with
4208this retrieved translation.  Once this done, the obsolete entry is
4209not wanted anymore, and may be safely deleted.
4210
4211When the translator finds an untranslated entry and suspects that a
4212slight variant of the translation exists, she immediately uses @kbd{m}
4213to mark the current entry location, then starts chasing obsolete
4214entries with @kbd{o}, hoping to find some translation corresponding
4215to the unmodified string.  Once found, she uses the @kbd{@key{DEL}} command
4216for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
4217the translation, that is, pushes the translation on the kill ring.
4218Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
4219then @emph{yanks} the saved translation right into the @code{msgstr}
4220field.  The translator is then free to use @kbd{@key{RET}} for fine
4221tuning the translation contents, and maybe to later use @kbd{u},
4222then @kbd{m} again, for going on with the next untranslated string.
4223
4224When some sequence of keys has to be typed over and over again, the
4225translator may find it useful to become better acquainted with the Emacs
4226capability of learning these sequences and playing them back under request.
4227@xref{Keyboard Macros, , , emacs, The Emacs Editor}.
4228
4229@node Modifying Comments
4230@subsection Modifying Comments
4231@cindex editing comments in PO files
4232@emindex editing comments
4233
4234Any translation work done seriously will raise many linguistic
4235difficulties, for which decisions have to be made, and the choices
4236further documented.  These documents may be saved within the
4237PO file in form of translator comments, which the translator
4238is free to create, delete, or modify at will.  These comments may
4239be useful to herself when she returns to this PO file after a while.
4240
4241Comments not having whitespace after the initial @samp{#}, for example,
4242those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
4243comments, they are exclusively created by other @code{gettext} tools.
4244So, the commands below will never alter such system added comments,
4245they are not meant for the translator to modify.  @xref{PO Files}.
4246
4247The following commands are somewhat similar to those modifying translations,
4248so the general indications given for those apply here.  @xref{Modifying
4249Translations}.
4250
4251@table @kbd
4252
4253@item #
4254@efindex #@r{, PO Mode command}
4255Interactively edit the translator comments (@code{po-edit-comment}).
4256
4257@item K
4258@efindex K@r{, PO Mode command}
4259Save the translator comments on the kill ring, and delete it
4260(@code{po-kill-comment}).
4261
4262@item W
4263@efindex W@r{, PO Mode command}
4264Save the translator comments on the kill ring, without deleting it
4265(@code{po-kill-ring-save-comment}).
4266
4267@item Y
4268@efindex Y@r{, PO Mode command}
4269Replace the translator comments, taking the new from the kill ring
4270(@code{po-yank-comment}).
4271
4272@end table
4273
4274These commands parallel PO mode commands for modifying the translation
4275strings, and behave much the same way as they do, except that they handle
4276this part of PO file comments meant for translator usage, rather
4277than the translation strings.  So, if the descriptions given below are
4278slightly succinct, it is because the full details have already been given.
4279@xref{Modifying Translations}.
4280
4281@efindex #@r{, PO Mode command}
4282@efindex po-edit-comment@r{, PO Mode command}
4283The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
4284containing a copy of the translator comments on the current PO file entry.
4285If there are no such comments, PO mode understands that the translator wants
4286to add a comment to the entry, and she is presented with an empty screen.
4287Comment marks (@code{#}) and the space following them are automatically
4288removed before edition, and reinstated after.  For translator comments
4289pertaining to obsolete entries, the uncommenting and recommenting operations
4290are done twice.  Once in the editing window, the keys @w{@kbd{C-c C-c}}
4291allow the translator to tell she is finished with editing the comment.
4292@xref{Subedit}, for further details.
4293
4294@evindex po-subedit-mode-hook@r{, PO Mode variable}
4295Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4296the string has been inserted in the edit buffer.
4297
4298@efindex K@r{, PO Mode command}
4299@efindex po-kill-comment@r{, PO Mode command}
4300@efindex W@r{, PO Mode command}
4301@efindex po-kill-ring-save-comment@r{, PO Mode command}
4302@efindex Y@r{, PO Mode command}
4303@efindex po-yank-comment@r{, PO Mode command}
4304The command @kbd{K} (@code{po-kill-comment}) gets rid of all
4305translator comments, while saving those comments on the kill ring.
4306The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
4307a copy of the translator comments on the kill ring, but leaves
4308them undisturbed in the current entry.  The command @kbd{Y}
4309(@code{po-yank-comment}) completely replaces the translator comments
4310by a string taken at the front of the kill ring.  When this command
4311is immediately repeated, the comments just inserted are withdrawn,
4312and replaced by other strings taken along the kill ring.
4313
4314On the kill ring, all strings have the same nature.  There is no
4315distinction between @emph{translation} strings and @emph{translator
4316comments} strings.  So, for example, let's presume the translator
4317has just finished editing a translation, and wants to create a new
4318translator comment to document why the previous translation was
4319not good, just to remember what was the problem.  Foreseeing that she
4320will do that in her documentation, the translator may want to quote
4321the previous translation in her translator comments.  To do so, she
4322may initialize the translator comments with the previous translation,
4323still at the head of the kill ring.  Because editing already pushed the
4324previous translation on the kill ring, she merely has to type @kbd{M-w}
4325prior to @kbd{#}, and the previous translation will be right there,
4326all ready for being introduced by some explanatory text.
4327
4328On the other hand, presume there are some translator comments already
4329and that the translator wants to add to those comments, instead
4330of wholly replacing them.  Then, she should edit the comment right
4331away with @kbd{#}.  Once inside the editing window, she can use the
4332regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
4333(@code{yank-pop}) to get the previous translation where she likes.
4334
4335@node Subedit
4336@subsection Details of Sub Edition
4337@emindex subedit minor mode
4338
4339The PO subedit minor mode has a few peculiarities worth being described
4340in fuller detail.  It installs a few commands over the usual editing set
4341of Emacs, which are described below.
4342
4343@table @kbd
4344@item C-c C-c
4345@efindex C-c C-c@r{, PO Mode command}
4346Complete edition (@code{po-subedit-exit}).
4347
4348@item C-c C-k
4349@efindex C-c C-k@r{, PO Mode command}
4350Abort edition (@code{po-subedit-abort}).
4351
4352@item C-c C-a
4353@efindex C-c C-a@r{, PO Mode command}
4354Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
4355
4356@end table
4357
4358@emindex exiting PO subedit
4359@efindex C-c C-c@r{, PO Mode command}
4360@efindex po-subedit-exit@r{, PO Mode command}
4361The window's contents represents a translation for a given message,
4362or a translator comment.  The translator may modify this window to
4363her heart's content.  Once this is done, the command @w{@kbd{C-c C-c}}
4364(@code{po-subedit-exit}) may be used to return the edited translation into
4365the PO file, replacing the original translation, even if it moved out of
4366sight or if buffers were switched.
4367
4368@efindex C-c C-k@r{, PO Mode command}
4369@efindex po-subedit-abort@r{, PO Mode command}
4370If the translator becomes unsatisfied with her translation or comment,
4371to the extent she prefers keeping what was existent prior to the
4372@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
4373(@code{po-subedit-abort}) to merely get rid of edition, while preserving
4374the original translation or comment.  Another way would be for her to exit
4375normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
4376whole effect of last edition.
4377
4378@efindex C-c C-a@r{, PO Mode command}
4379@efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
4380The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
4381allows for glancing through translations
4382already achieved in other languages, directly while editing the current
4383translation.  This may be quite convenient when the translator is fluent
4384at many languages, but of course, only makes sense when such completed
4385auxiliary PO files are already available to her (@pxref{Auxiliary}).
4386
4387Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4388the string has been inserted in the edit buffer.
4389
4390While editing her translation, the translator should pay attention to not
4391inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
4392the translated string if those are not meant to be there, or to removing
4393such characters when they are required.  Since these characters are not
4394visible in the editing buffer, they are easily introduced by mistake.
4395To help her, @kbd{@key{RET}} automatically puts the character @code{<}
4396at the end of the string being edited, but this @code{<} is not really
4397part of the string.  On exiting the editing window with @w{@kbd{C-c C-c}},
4398PO mode automatically removes such @kbd{<} and all whitespace added after
4399it.  If the translator adds characters after the terminating @code{<}, it
4400looses its delimiting property and integrally becomes part of the string.
4401If she removes the delimiting @code{<}, then the edited string is taken
4402@emph{as is}, with all trailing newlines, even if invisible.  Also, if
4403the translated string ought to end itself with a genuine @code{<}, then
4404the delimiting @code{<} may not be removed; so the string should appear,
4405in the editing window, as ending with two @code{<} in a row.
4406
4407@emindex editing multiple entries
4408When a translation (or a comment) is being edited, the translator may move
4409the cursor back into the PO file buffer and freely move to other entries,
4410browsing at will.  If, with an edition pending, the translator wanders in the
4411PO file buffer, she may decide to start modifying another entry.  Each entry
4412being edited has its own subedit buffer.  It is possible to simultaneously
4413edit the translation @emph{and} the comment of a single entry, or to
4414edit entries in different PO files, all at once.  Typing @kbd{@key{RET}}
4415on a field already being edited merely resumes that particular edit.  Yet,
4416the translator should better be comfortable at handling many Emacs windows!
4417
4418@emindex pending subedits
4419Pending subedits may be completed or aborted in any order, regardless
4420of how or when they were started.  When many subedits are pending and the
4421translator asks for quitting the PO file (with the @kbd{q} command), subedits
4422are automatically resumed one at a time, so she may decide for each of them.
4423
4424@node C Sources Context
4425@subsection C Sources Context
4426@emindex consulting program sources
4427@emindex looking at the source to aid translation
4428@emindex use the source, Luke
4429
4430PO mode is particularly powerful when used with PO files
4431created through GNU @code{gettext} utilities, as those utilities
4432insert special comments in the PO files they generate.
4433Some of these special comments relate the PO file entry to
4434exactly where the untranslated string appears in the program sources.
4435
4436When the translator gets to an untranslated entry, she is fairly
4437often faced with an original string which is not as informative as
4438it normally should be, being succinct, cryptic, or otherwise ambiguous.
4439Before choosing how to translate the string, she needs to understand
4440better what the string really means and how tight the translation has
4441to be.  Most of the time, when problems arise, the only way left to make
4442her judgment is looking at the true program sources from where this
4443string originated, searching for surrounding comments the programmer
4444might have put in there, and looking around for helping clues of
4445@emph{any} kind.
4446
4447Surely, when looking at program sources, the translator will receive
4448more help if she is a fluent programmer.  However, even if she is
4449not versed in programming and feels a little lost in C code, the
4450translator should not be shy at taking a look, once in a while.
4451It is most probable that she will still be able to find some of the
4452hints she needs.  She will learn quickly to not feel uncomfortable
4453in program code, paying more attention to programmer's comments,
4454variable and function names (if he dared choosing them well), and
4455overall organization, than to the program code itself.
4456
4457@emindex find source fragment for a PO file entry
4458The following commands are meant to help the translator at getting
4459program source context for a PO file entry.
4460
4461@table @kbd
4462@item s
4463@efindex s@r{, PO Mode command}
4464Resume the display of a program source context, or cycle through them
4465(@code{po-cycle-source-reference}).
4466
4467@item M-s
4468@efindex M-s@r{, PO Mode command}
4469Display of a program source context selected by menu
4470(@code{po-select-source-reference}).
4471
4472@item S
4473@efindex S@r{, PO Mode command}
4474Add a directory to the search path for source files
4475(@code{po-consider-source-path}).
4476
4477@item M-S
4478@efindex M-S@r{, PO Mode command}
4479Delete a directory from the search path for source files
4480(@code{po-ignore-source-path}).
4481
4482@end table
4483
4484@efindex s@r{, PO Mode command}
4485@efindex po-cycle-source-reference@r{, PO Mode command}
4486@efindex M-s@r{, PO Mode command}
4487@efindex po-select-source-reference@r{, PO Mode command}
4488The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
4489(@code{po-select-source-reference}) both open another window displaying
4490some source program file, and already positioned in such a way that
4491it shows an actual use of the string to be translated.  By doing
4492so, the command gives source program context for the string.  But if
4493the entry has no source context references, or if all references
4494are unresolved along the search path for program sources, then the
4495command diagnoses this as an error.
4496
4497Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
4498in the PO file window.  If the translator really wants to
4499get into the program source window, she ought to do it explicitly,
4500maybe by using command @kbd{O}.
4501
4502When @kbd{s} is typed for the first time, or for a PO file entry which
4503is different of the last one used for getting source context, then the
4504command reacts by giving the first context available for this entry,
4505if any.  If some context has already been recently displayed for the
4506current PO file entry, and the translator wandered off to do other
4507things, typing @kbd{s} again will merely resume, in another window,
4508the context last displayed.  In particular, if the translator moved
4509the cursor away from the context in the source file, the command will
4510bring the cursor back to the context.  By using @kbd{s} many times
4511in a row, with no other commands intervening, PO mode will cycle to
4512the next available contexts for this particular entry, getting back
4513to the first context once the last has been shown.
4514
4515The command @kbd{M-s} behaves differently.  Instead of cycling through
4516references, it lets the translator choose a particular reference among
4517many, and displays that reference.  It is best used with completion,
4518if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
4519response to the question, she will be offered a menu of all possible
4520references, as a reminder of which are the acceptable answers.
4521This command is useful only where there are really many contexts
4522available for a single string to translate.
4523
4524@efindex S@r{, PO Mode command}
4525@efindex po-consider-source-path@r{, PO Mode command}
4526@efindex M-S@r{, PO Mode command}
4527@efindex po-ignore-source-path@r{, PO Mode command}
4528Program source files are usually found relative to where the PO
4529file stands.  As a special provision, when this fails, the file is
4530also looked for, but relative to the directory immediately above it.
4531Those two cases take proper care of most PO files.  However, it might
4532happen that a PO file has been moved, or is edited in a different
4533place than its normal location.  When this happens, the translator
4534should tell PO mode in which directory normally sits the genuine PO
4535file.  Many such directories may be specified, and all together, they
4536constitute what is called the @dfn{search path} for program sources.
4537The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
4538enter a new directory at the front of the search path, and the command
4539@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
4540one of the directories she does not want anymore on the search path.
4541
4542@node Auxiliary
4543@subsection Consulting Auxiliary PO Files
4544@emindex consulting translations to other languages
4545
4546PO mode is able to help the knowledgeable translator, being fluent in
4547many languages, at taking advantage of translations already achieved
4548in other languages she just happens to know.  It provides these other
4549language translations as additional context for her own work.  Moreover,
4550it has features to ease the production of translations for many languages
4551at once, for translators preferring to work in this way.
4552
4553@cindex auxiliary PO file
4554@emindex auxiliary PO file
4555An @dfn{auxiliary} PO file is an existing PO file meant for the same
4556package the translator is working on, but targeted to a different mother
4557tongue language.  Commands exist for declaring and handling auxiliary
4558PO files, and also for showing contexts for the entry under work.
4559
4560Here are the auxiliary file commands available in PO mode.
4561
4562@table @kbd
4563@item a
4564@efindex a@r{, PO Mode command}
4565Seek auxiliary files for another translation for the same entry
4566(@code{po-cycle-auxiliary}).
4567
4568@item C-c C-a
4569@efindex C-c C-a@r{, PO Mode command}
4570Switch to a particular auxiliary file (@code{po-select-auxiliary}).
4571
4572@item A
4573@efindex A@r{, PO Mode command}
4574Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
4575
4576@item M-A
4577@efindex M-A@r{, PO Mode command}
4578Remove this PO file from the list of auxiliary files
4579(@code{po-ignore-as-auxiliary}).
4580
4581@end table
4582
4583@efindex A@r{, PO Mode command}
4584@efindex po-consider-as-auxiliary@r{, PO Mode command}
4585@efindex M-A@r{, PO Mode command}
4586@efindex po-ignore-as-auxiliary@r{, PO Mode command}
4587Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
4588PO file to the list of auxiliary files, while command @kbd{M-A}
4589(@code{po-ignore-as-auxiliary} just removes it.
4590
4591@efindex a@r{, PO Mode command}
4592@efindex po-cycle-auxiliary@r{, PO Mode command}
4593The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
4594files, round-robin, searching for a translated entry in some other language
4595having an @code{msgid} field identical as the one for the current entry.
4596The found PO file, if any, takes the place of the current PO file in
4597the display (its window gets on top).  Before doing so, the current PO
4598file is also made into an auxiliary file, if not already.  So, @kbd{a}
4599in this newly displayed PO file will seek another PO file, and so on,
4600so repeating @kbd{a} will eventually yield back the original PO file.
4601
4602@efindex C-c C-a@r{, PO Mode command}
4603@efindex po-select-auxiliary@r{, PO Mode command}
4604The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
4605for her choice of a particular auxiliary file, with completion, and
4606then switches to that selected PO file.  The command also checks if
4607the selected file has an @code{msgid} field identical as the one for
4608the current entry, and if yes, this entry becomes current.  Otherwise,
4609the cursor of the selected file is left undisturbed.
4610
4611For all this to work fully, auxiliary PO files will have to be normalized,
4612in that way that @code{msgid} fields should be written @emph{exactly}
4613the same way.  It is possible to write @code{msgid} fields in various
4614ways for representing the same string, different writing would break the
4615proper behaviour of the auxiliary file commands of PO mode.  This is not
4616expected to be much a problem in practice, as most existing PO files have
4617their @code{msgid} entries written by the same GNU @code{gettext} tools.
4618
4619@efindex normalize@r{, PO Mode command}
4620However, PO files initially created by PO mode itself, while marking
4621strings in source files, are normalised differently.  So are PO
4622files resulting of the @samp{M-x normalize} command.  Until these
4623discrepancies between PO mode and other GNU @code{gettext} tools get
4624fully resolved, the translator should stay aware of normalisation issues.
4625
4626@node Compendium
4627@section Using Translation Compendia
4628@emindex using translation compendia
4629
4630@cindex compendium
4631A @dfn{compendium} is a special PO file containing a set of
4632translations recurring in many different packages.  The translator can
4633use gettext tools to build a new compendium, to add entries to her
4634compendium, and to initialize untranslated entries, or to update
4635already translated entries, from translations kept in the compendium.
4636
4637@menu
4638* Creating Compendia::          Merging translations for later use
4639* Using Compendia::             Using older translations if they fit
4640@end menu
4641
4642@node Creating Compendia
4643@subsection Creating Compendia
4644@cindex creating compendia
4645@cindex compendium, creating
4646
4647Basically every PO file consisting of translated entries only can be
4648declared as a valid compendium.  Often the translator wants to have
4649special compendia; let's consider two cases: @cite{concatenating PO
4650files} and @cite{extracting a message subset from a PO file}.
4651
4652@subsubsection Concatenate PO Files
4653
4654@cindex concatenating PO files into a compendium
4655@cindex accumulating translations
4656To concatenate several valid PO files into one compendium file you can
4657use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
4658
4659@example
4660msgcat -o compendium.po file1.po file2.po
4661@end example
4662
4663By default, @code{msgcat} will accumulate divergent translations
4664for the same string.  Those occurrences will be marked as @code{fuzzy}
4665and highly visible decorated; calling @code{msgcat} on
4666@file{file1.po}:
4667
4668@example
4669#: src/hello.c:200
4670#, c-format
4671msgid "Report bugs to <%s>.\n"
4672msgstr "Comunicar `bugs' a <%s>.\n"
4673@end example
4674
4675@noindent
4676and @file{file2.po}:
4677
4678@example
4679#: src/bye.c:100
4680#, c-format
4681msgid "Report bugs to <%s>.\n"
4682msgstr "Comunicar \"bugs\" a <%s>.\n"
4683@end example
4684
4685@noindent
4686will result in:
4687
4688@example
4689#: src/hello.c:200 src/bye.c:100
4690#, fuzzy, c-format
4691msgid "Report bugs to <%s>.\n"
4692msgstr ""
4693"#-#-#-#-#  file1.po  #-#-#-#-#\n"
4694"Comunicar `bugs' a <%s>.\n"
4695"#-#-#-#-#  file2.po  #-#-#-#-#\n"
4696"Comunicar \"bugs\" a <%s>.\n"
4697@end example
4698
4699@noindent
4700The translator will have to resolve this ``conflict'' manually; she
4701has to decide whether the first or the second version is appropriate
4702(or provide a new translation), to delete the ``marker lines'', and
4703finally to remove the @code{fuzzy} mark.
4704
4705If the translator knows in advance the first found translation of a
4706message is always the best translation she can make use to the
4707@samp{--use-first} switch:
4708
4709@example
4710msgcat --use-first -o compendium.po file1.po file2.po
4711@end example
4712
4713A good compendium file must not contain @code{fuzzy} or untranslated
4714entries.  If input files are ``dirty'' you must preprocess the input
4715files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
4716
4717@subsubsection Extract a Message Subset from a PO File
4718@cindex extracting parts of a PO file into a compendium
4719
4720Nobody wants to translate the same messages again and again; thus you
4721may wish to have a compendium file containing @file{getopt.c} messages.
4722
4723To extract a message subset (e.g., all @file{getopt.c} messages) from an
4724existing PO file into one compendium file you can use @samp{msggrep}:
4725
4726@example
4727msggrep --location src/getopt.c -o compendium.po file.po
4728@end example
4729
4730@node Using Compendia
4731@subsection Using Compendia
4732
4733You can use a compendium file to initialize a translation from scratch
4734or to update an already existing translation.
4735
4736@subsubsection Initialize a New Translation File
4737@cindex initialize translations from a compendium
4738
4739Since a PO file with translations does not exist the translator can
4740merely use @file{/dev/null} to fake the ``old'' translation file.
4741
4742@example
4743msgmerge --compendium compendium.po -o file.po /dev/null file.pot
4744@end example
4745
4746@subsubsection Update an Existing Translation File
4747@cindex update translations from a compendium
4748
4749Concatenate the compendium file(s) and the existing PO, merge the
4750result with the POT file and remove the obsolete entries (optional,
4751here done using @samp{msgattrib}):
4752
4753@example
4754msgcat --use-first -o update.po compendium1.po compendium2.po file.po
4755msgmerge update.po file.pot | msgattrib --no-obsolete > file.po
4756@end example
4757
4758@node Manipulating
4759@chapter Manipulating PO Files
4760@cindex manipulating PO files
4761
4762Sometimes it is necessary to manipulate PO files in a way that is better
4763performed automatically than by hand.  GNU @code{gettext} includes a
4764complete set of tools for this purpose.
4765
4766@cindex merging two PO files
4767When merging two packages into a single package, the resulting POT file
4768will be the concatenation of the two packages' POT files.  Thus the
4769maintainer must concatenate the two existing package translations into
4770a single translation catalog, for each language.  This is best performed
4771using @samp{msgcat}.  It is then the translators' duty to deal with any
4772possible conflicts that arose during the merge.
4773
4774@cindex encoding conversion
4775When a translator takes over the translation job from another translator,
4776but she uses a different character encoding in her locale, she will
4777convert the catalog to her character encoding.  This is best done through
4778the @samp{msgconv} program.
4779
4780When a maintainer takes a source file with tagged messages from another
4781package, he should also take the existing translations for this source
4782file (and not let the translators do the same job twice).  One way to do
4783this is through @samp{msggrep}, another is to create a POT file for
4784that source file and use @samp{msgmerge}.
4785
4786@cindex dialect
4787@cindex orthography
4788When a translator wants to adjust some translation catalog for a special
4789dialect or orthography --- for example, German as written in Switzerland
4790versus German as written in Germany --- she needs to apply some text
4791processing to every message in the catalog.  The tool for doing this is
4792@samp{msgfilter}.
4793
4794Another use of @code{msgfilter} is to produce approximately the POT file for
4795which a given PO file was made.  This can be done through a filter command
4796like @samp{msgfilter sed -e d | sed -e '/^# /d'}.  Note that the original
4797POT file may have had different comments and different plural message counts,
4798that's why it's better to use the original POT file if available.
4799
4800@cindex checking of translations
4801When a translator wants to check her translations, for example according
4802to orthography rules or using a non-interactive spell checker, she can do
4803so using the @samp{msgexec} program.
4804
4805@cindex duplicate elimination
4806When third party tools create PO or POT files, sometimes duplicates cannot
4807be avoided.  But the GNU @code{gettext} tools give an error when they
4808encounter duplicate msgids in the same file and in the same domain.
4809To merge duplicates, the @samp{msguniq} program can be used.
4810
4811@samp{msgcomm} is a more general tool for keeping or throwing away
4812duplicates, occurring in different files.
4813
4814@samp{msgcmp} can be used to check whether a translation catalog is
4815completely translated.
4816
4817@cindex attributes, manipulating
4818@samp{msgattrib} can be used to select and extract only the fuzzy
4819or untranslated messages of a translation catalog.
4820
4821@samp{msgen} is useful as a first step for preparing English translation
4822catalogs.  It copies each message's msgid to its msgstr.
4823
4824Finally, for those applications where all these various programs are not
4825sufficient, a library @samp{libgettextpo} is provided that can be used to
4826write other specialized programs that process PO files.
4827
4828@menu
4829* msgcat Invocation::           Invoking the @code{msgcat} Program
4830* msgconv Invocation::          Invoking the @code{msgconv} Program
4831* msggrep Invocation::          Invoking the @code{msggrep} Program
4832* msgfilter Invocation::        Invoking the @code{msgfilter} Program
4833* msguniq Invocation::          Invoking the @code{msguniq} Program
4834* msgcomm Invocation::          Invoking the @code{msgcomm} Program
4835* msgcmp Invocation::           Invoking the @code{msgcmp} Program
4836* msgattrib Invocation::        Invoking the @code{msgattrib} Program
4837* msgen Invocation::            Invoking the @code{msgen} Program
4838* msgexec Invocation::          Invoking the @code{msgexec} Program
4839* Colorizing::                  Highlighting parts of PO files
4840* Other tools::                 Other tools for manipulating PO files
4841* libgettextpo::                Writing your own programs that process PO files
4842@end menu
4843
4844@node msgcat Invocation
4845@section Invoking the @code{msgcat} Program
4846
4847@include msgcat.texi
4848
4849@node msgconv Invocation
4850@section Invoking the @code{msgconv} Program
4851
4852@include msgconv.texi
4853
4854@node msggrep Invocation
4855@section Invoking the @code{msggrep} Program
4856
4857@include msggrep.texi
4858
4859@node msgfilter Invocation
4860@section Invoking the @code{msgfilter} Program
4861
4862@include msgfilter.texi
4863
4864@node msguniq Invocation
4865@section Invoking the @code{msguniq} Program
4866
4867@include msguniq.texi
4868
4869@node msgcomm Invocation
4870@section Invoking the @code{msgcomm} Program
4871
4872@include msgcomm.texi
4873
4874@node msgcmp Invocation
4875@section Invoking the @code{msgcmp} Program
4876
4877@include msgcmp.texi
4878
4879@node msgattrib Invocation
4880@section Invoking the @code{msgattrib} Program
4881
4882@include msgattrib.texi
4883
4884@node msgen Invocation
4885@section Invoking the @code{msgen} Program
4886
4887@include msgen.texi
4888
4889@node msgexec Invocation
4890@section Invoking the @code{msgexec} Program
4891
4892@include msgexec.texi
4893
4894@node Colorizing
4895@section Highlighting parts of PO files
4896
4897Translators are usually only interested in seeing the untranslated and
4898fuzzy messages of a PO file.  Also, when a message is set fuzzy because
4899the msgid changed, they want to see the differences between the previous
4900msgid and the current one (especially if the msgid is long and only few
4901words in it have changed).  Finally, it's always welcome to highlight the
4902different sections of a message in a PO file (comments, msgid, msgstr, etc.).
4903
4904Such highlighting is possible through the options @samp{--color} and
4905@samp{--style}.  They are supported by all the programs that produce
4906a PO file on standard output, such as @code{msgcat}, @code{msgmerge},
4907and @code{msgunfmt}.
4908
4909@menu
4910* The --color option::          Triggering colorized output
4911* The TERM variable::           The environment variable @code{TERM}
4912* The --style option::          The @code{--style} option
4913* Style rules::                 Style rules for PO files
4914* Customizing less::            Customizing @code{less} for viewing PO files
4915@end menu
4916
4917@node The --color option
4918@subsection The @code{--color} option
4919
4920@opindex --color@r{, @code{msgcat} option}
4921The @samp{--color=@var{when}} option specifies under which conditions
4922colorized output should be generated.  The @var{when} part can be one of
4923the following:
4924
4925@table @code
4926@item always
4927@itemx yes
4928The output will be colorized.
4929
4930@item never
4931@itemx no
4932The output will not be colorized.
4933
4934@item auto
4935@itemx tty
4936The output will be colorized if the output device is a tty, i.e.@: when the
4937output goes directly to a text screen or terminal emulator window.
4938
4939@item html
4940The output will be colorized and be in HTML format.
4941
4942@item test
4943This is a special value, understood only by the @code{msgcat} program.  It
4944is explained in the next section (@ref{The TERM variable}).
4945@end table
4946
4947@noindent
4948@samp{--color} is equivalent to @samp{--color=yes}.  The default is
4949@samp{--color=auto}.
4950
4951Thus, a command like @samp{msgcat vi.po} will produce colorized output
4952when called by itself in a command window.  Whereas in a pipe, such as
4953@samp{msgcat vi.po | less -R}, it will not produce colorized output.  To
4954get colorized output in this situation nevertheless, use the command
4955@samp{msgcat --color vi.po | less -R}.
4956
4957The @samp{--color=html} option will produce output that can be viewed in
4958a browser.  This can be useful, for example, for Indic languages,
4959because the renderic of Indic scripts in browsers is usually better than
4960in terminal emulators.
4961
4962Note that the output produced with the @code{--color} option is @emph{not}
4963a valid PO file in itself.  It contains additional terminal-specific escape
4964sequences or HTML tags.  A PO file reader will give a syntax error when
4965confronted with such content.  Except for the @samp{--color=html} case,
4966you therefore normally don't need to save output produced with the
4967@code{--color} option in a file.
4968
4969@node The TERM variable
4970@subsection The environment variable @code{TERM}
4971
4972@vindex TERM@r{, environment variable}
4973The environment variable @code{TERM} contains a identifier for the text
4974window's capabilities.  You can get a detailed list of these cababilities
4975by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a
4976reference.
4977
4978When producing text with embedded color directives, @code{msgcat} looks
4979at the @code{TERM} variable.  Text windows today typically support at least
49808 colors.  Often, however, the text window supports 16 or more colors,
4981even though the @code{TERM} variable is set to a identifier denoting only
49828 supported colors.  It can be worth setting the @code{TERM} variable to
4983a different value in these cases:
4984
4985@table @code
4986@item xterm
4987@code{xterm} is in most cases built with support for 16 colors.  It can also
4988be built with support for 88 or 256 colors (but not both).  You can try to
4989set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or
4990@code{xterm-256color}.
4991
4992@item rxvt
4993@code{rxvt} is often built with support for 16 colors.  You can try to set
4994@code{TERM} to @code{rxvt-16color}.
4995
4996@item konsole
4997@code{konsole} too is often built with support for 16 colors.  You can try to
4998set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}.
4999@end table
5000
5001After setting @code{TERM}, you can verify it by invoking
5002@samp{msgcat --color=test} and seeing whether the output looks like a
5003reasonable color map.
5004
5005@node The --style option
5006@subsection The @code{--style} option
5007
5008@opindex --style@r{, @code{msgcat} option}
5009The @samp{--style=@var{style_file}} option specifies the style file to use
5010when colorizing.  It has an effect only when the @code{--color} option is
5011effective.
5012
5013@vindex PO_STYLE@r{, environment variable}
5014If the @code{--style} option is not specified, the environment variable
5015@code{PO_STYLE} is considered.  It is meant to point to the user's
5016preferred style for PO files.
5017
5018The default style file is @file{$prefix/share/gettext/styles/po-default.css},
5019where @code{$prefix} is the installation location.
5020
5021A few style files are predefined:
5022@table @file
5023@item po-vim.css
5024This style imitates the look used by vim 7.
5025
5026@item po-emacs-x.css
5027This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.
5028
5029@item po-emacs-xterm.css
5030@itemx po-emacs-xterm16.css
5031@itemx po-emacs-xterm256.css
5032This style imitates the look used by GNU Emacs 22 in a terminal of type
5033@samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or
5034@samp{xterm-256color} (256 colors), respectively.
5035@end table
5036
5037@noindent
5038You can use these styles without specifying a directory.  They are actually
5039located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the
5040installation location.
5041
5042You can also design your own styles.  This is described in the next section.
5043
5044
5045@node Style rules
5046@subsection Style rules for PO files
5047
5048The same style file can be used for styling of a PO file, for terminal
5049output and for HTML output.  It is written in CSS (Cascading Style Sheet)
5050syntax.  See @url{https://www.w3.org/TR/css2/cover.html} for a formal
5051definition of CSS.  Many HTML authoring tutorials also contain explanations
5052of CSS.
5053
5054In the case of HTML output, the style file is embedded in the HTML output.
5055In the case of text output, the style file is interpreted by the
5056@code{msgcat} program.  This means, in particular, that when
5057@code{@@import} is used with relative file names, the file names are
5058
5059@itemize @minus
5060@item
5061relative to the resulting HTML file, in the case of HTML output,
5062
5063@item
5064relative to the style sheet containing the @code{@@import}, in the case of
5065text output.  (Actually, @code{@@import}s are not yet supported in this case,
5066due to a limitation in @code{libcroco}.)
5067@end itemize
5068
5069CSS rules are built up from selectors and declarations.  The declarations
5070specify graphical properties; the selectors specify when they apply.
5071
5072In PO files, the following simple selectors (based on "CSS classes", see
5073the CSS2 spec, section 5.8.3) are supported.
5074
5075@itemize @bullet
5076@item
5077Selectors that apply to entire messages:
5078
5079@table @code
5080@item .header
5081This matches the header entry of a PO file.
5082
5083@item .translated
5084This matches a translated message.
5085
5086@item .untranslated
5087This matches an untranslated message (i.e.@: a message with empty translation).
5088
5089@item .fuzzy
5090This matches a fuzzy message (i.e.@: a message which has a translation that
5091needs review by the translator).
5092
5093@item .obsolete
5094This matches an obsolete message (i.e.@: a message that was translated but is
5095not needed by the current POT file any more).
5096@end table
5097
5098@item
5099Selectors that apply to parts of a message in PO syntax.  Recall the general
5100structure of a message in PO syntax:
5101
5102@example
5103@var{white-space}
5104#  @var{translator-comments}
5105#. @var{extracted-comments}
5106#: @var{reference}@dots{}
5107#, @var{flag}@dots{}
5108#| msgid @var{previous-untranslated-string}
5109msgid @var{untranslated-string}
5110msgstr @var{translated-string}
5111@end example
5112
5113@table @code
5114@item .comment
5115This matches all comments (translator comments, extracted comments,
5116source file reference comments, flag comments, previous message comments,
5117as well as the entire obsolete messages).
5118
5119@item .translator-comment
5120This matches the translator comments.
5121
5122@item .extracted-comment
5123This matches the extracted comments, i.e.@: the comments placed by the
5124programmer at the attention of the translator.
5125
5126@item .reference-comment
5127This matches the source file reference comments (entire lines).
5128
5129@item .reference
5130This matches the individual source file references inside the source file
5131reference comment lines.
5132
5133@item .flag-comment
5134This matches the flag comment lines (entire lines).
5135
5136@item .flag
5137This matches the individual flags inside flag comment lines.
5138
5139@item .fuzzy-flag
5140This matches the `fuzzy' flag inside flag comment lines.
5141
5142@item .previous-comment
5143This matches the comments containing the previous untranslated string (entire
5144lines).
5145
5146@item .previous
5147This matches the previous untranslated string including the string delimiters,
5148the associated keywords (@code{msgid} etc.) and the spaces between them.
5149
5150@item .msgid
5151This matches the untranslated string including the string delimiters,
5152the associated keywords (@code{msgid} etc.) and the spaces between them.
5153
5154@item .msgstr
5155This matches the translated string including the string delimiters,
5156the associated keywords (@code{msgstr} etc.) and the spaces between them.
5157
5158@item .keyword
5159This matches the keywords (@code{msgid}, @code{msgstr}, etc.).
5160
5161@item .string
5162This matches strings, including the string delimiters (double quotes).
5163@end table
5164
5165@item
5166Selectors that apply to parts of strings:
5167
5168@table @code
5169@item .text
5170This matches the entire contents of a string (excluding the string delimiters,
5171i.e.@: the double quotes).
5172
5173@item .escape-sequence
5174This matches an escape sequence (starting with a backslash).
5175
5176@item .format-directive
5177This matches a format string directive (starting with a @samp{%} sign in the
5178case of most programming languages, with a @samp{@{} in the case of
5179@code{java-format} and @code{csharp-format}, with a @samp{~} in the case of
5180@code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of
5181@code{sh-format}).
5182
5183@item .invalid-format-directive
5184This matches an invalid format string directive.
5185
5186@item .added
5187In an untranslated string, this matches a part of the string that was not
5188present in the previous untranslated string.  (Not yet implemented in this
5189release.)
5190
5191@item .changed
5192In an untranslated string or in a previous untranslated string, this matches
5193a part of the string that is changed or replaced.  (Not yet implemented in
5194this release.)
5195
5196@item .removed
5197In a previous untranslated string, this matches a part of the string that
5198is not present in the current untranslated string.  (Not yet implemented in
5199this release.)
5200@end table
5201@end itemize
5202
5203These selectors can be combined to hierarchical selectors.  For example,
5204
5205@smallexample
5206.msgstr .invalid-format-directive @{ color: red; @}
5207@end smallexample
5208
5209@noindent
5210will highlight the invalid format directives in the translated strings.
5211
5212In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements
5213(CSS2 spec, section 5.12) are not supported.
5214
5215The declarations in HTML mode are not limited; any graphical attribute
5216supported by the browsers can be used.
5217
5218The declarations in text mode are limited to the following properties.  Other
5219properties will be silently ignored.
5220
5221@table @asis
5222@item @code{color} (CSS2 spec, section 14.1)
5223@itemx @code{background-color} (CSS2 spec, section 14.2.1)
5224These properties is supported.  Colors will be adjusted to match the terminal's
5225capabilities.  Note that many terminals support only 8 colors.
5226
5227@item @code{font-weight} (CSS2 spec, section 15.2.3)
5228This property is supported, but most terminals can only render two different
5229weights: @code{normal} and @code{bold}.  Values >= 600 are rendered as
5230@code{bold}.
5231
5232@item @code{font-style} (CSS2 spec, section 15.2.3)
5233This property is supported.  The values @code{italic} and @code{oblique} are
5234rendered the same way.
5235
5236@item @code{text-decoration} (CSS2 spec, section 16.3.1)
5237This property is supported, limited to the values @code{none} and
5238@code{underline}.
5239@end table
5240
5241@node Customizing less
5242@subsection Customizing @code{less} for viewing PO files
5243
5244The @samp{less} program is a popular text file browser for use in a text
5245screen or terminal emulator.  It also supports text with embedded escape
5246sequences for colors and text decorations.
5247
5248You can use @code{less} to view a PO file like this (assuming an UTF-8
5249environment):
5250
5251@smallexample
5252msgcat --to-code=UTF-8 --color xyz.po | less -R
5253@end smallexample
5254
5255You can simplify this to this simple command:
5256
5257@smallexample
5258less xyz.po
5259@end smallexample
5260
5261@noindent
5262after these three preparations:
5263
5264@enumerate
5265@item
5266Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment
5267variable.  In sh shells:
5268@smallexample
5269$ LESS="$LESS -R -f"
5270$ export LESS
5271@end smallexample
5272
5273@item
5274If your system does not already have the @file{lessopen.sh} and
5275@file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and
5276@code{LESSCLOSE} environment variables, as indicated in the manual page
5277(@samp{man less}).
5278
5279@item
5280Add to @file{lessopen.sh} a piece of script that recognizes PO files
5281through their file extension and invokes @code{msgcat} on them, producing
5282a temporary file.  Like this:
5283
5284@smallexample
5285case "$1" in
5286  *.po)
5287    tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"`
5288    msgcat --to-code=UTF-8 --color "$1" > "$tmpfile"
5289    echo "$tmpfile"
5290    exit 0
5291    ;;
5292esac
5293@end smallexample
5294@end enumerate
5295
5296@node Other tools
5297@section Other tools for manipulating PO files
5298
5299@cindex Pology
5300The ``Pology'' package is a Free Software package for manipulating PO files.
5301It features, in particular:
5302
5303@itemize
5304@item
5305Examination and in-place modification of collections of PO files.
5306@item
5307Format-aware diffing and patching of PO files.
5308@item
5309Handling of version-control branches.
5310@item
5311Fine-grained asynchronous review workflow.
5312@item
5313Custom translation validation.
5314@item
5315Language and project specific support.
5316@end itemize
5317
5318Its home page is at @url{http://pology.nedohodnik.net/}.
5319
5320@node libgettextpo
5321@section Writing your own programs that process PO files
5322
5323For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
5324is not sufficient, a set of C functions is provided in a library, to make it
5325possible to process PO files in your own programs.  When you use this library,
5326you don't need to write routines to parse the PO file; instead, you retrieve
5327a pointer in memory to each of messages contained in the PO file.  Functions
5328for writing those memory structures to a file after working with them are
5329provided too.
5330
5331The functions are declared in the header file @samp{<gettext-po.h>}, and are
5332defined in a library called @samp{libgettextpo}.
5333
5334@menu
5335* Error Handling::              Error handling functions
5336* po_file_t API::               File management
5337* po_message_iterator_t API::   Message iteration
5338* po_message_t API::            The basic units of the file
5339* PO Header Entry API::         Meta information of the file
5340* po_filepos_t API::            References to the sources
5341* Format Type API::             Supported format types
5342* Checking API::                Enforcing constraints
5343@end menu
5344
5345The following example shows code how these functions can be used.  Error
5346handling code is omitted, as its implementation is delegated to the user
5347provided functions.
5348
5349@example
5350struct po_xerror_handler handler =
5351  @{
5352    .xerror = @dots{},
5353    .xerror2 = @dots{}
5354  @};
5355const char *filename = @dots{};
5356/* Read the file into memory.  */
5357po_file_t file = po_file_read (filename, &handler);
5358
5359@{
5360  const char * const *domains = po_file_domains (file);
5361  const char * const *domainp;
5362
5363  /* Iterate the domains contained in the file.  */
5364  for (domainp = domains; *domainp; domainp++)
5365    @{
5366      po_message_t *message;
5367      const char *domain = *domainp;
5368      po_message_iterator_t iterator = po_message_iterator (file, domain);
5369
5370      /* Iterate each message inside the domain.  */
5371      while ((message = po_next_message (iterator)) != NULL)
5372        @{
5373          /* Read data from the message @dots{}  */
5374          const char *msgid = po_message_msgid (message);
5375          const char *msgstr = po_message_msgstr (message);
5376
5377          @dots{}
5378
5379          /* Modify its contents @dots{}  */
5380          if (perform_some_tests (msgid, msgstr))
5381            po_message_set_fuzzy (message, 1);
5382
5383          @dots{}
5384        @}
5385      /* Always release returned po_message_iterator_t.  */
5386      po_message_iterator_free (iterator);
5387    @}
5388
5389  /* Write back the result.  */
5390  po_file_t result = po_file_write (file, filename, &handler);
5391@}
5392
5393/* Always release the returned po_file_t.  */
5394po_file_free (file);
5395@end example
5396
5397@node Error Handling
5398@subsection Error Handling
5399
5400Error management is performed through callbacks provided by the user of
5401the library.  They are provided through a parameter with the following
5402type:
5403
5404@deftp {Data Type} struct po_xerror_handler
5405Its pointer is defined as @code{po_xerror_handler_t}.  Contains
5406two fields, @code{xerror} and @code{xerror2}, with the following function
5407signatures.
5408@end deftp
5409
5410@deftypefun void xerror (int@tie{}@var{severity}, po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{filename}, size_t@tie{}@var{lineno}, size_t@tie{}@var{column}, int@tie{}@var{multiline_p}, const@tie{}char@tie{}*@var{message_text})
5411
5412This function is called to signal a problem of the given @var{severity}.
5413It @emph{must not return} if @var{severity} is
5414@code{PO_SEVERITY_FATAL_ERROR}.
5415
5416@var{message_text} is the problem description.  When @var{multiline_p}
5417is true, it can contain multiple lines of text, each terminated with a
5418newline, otherwise a single line.
5419
5420@var{message} and/or @var{filename} and @var{lineno} indicate where the
5421problem occurred:
5422
5423@itemize @bullet
5424@item
5425If @var{filename} is @code{NULL}, @var{filename} and @var{lineno} and
5426@var{column} should be ignored.
5427
5428@item
5429If @var{lineno} is @code{(size_t)(-1)}, @var{lineno} and @var{column}
5430should be ignored.
5431
5432@item
5433If @var{column} is @code{(size_t)(-1)}, it should be ignored.
5434@end itemize
5435@end deftypefun
5436
5437@deftypefun void xerror2 (int@tie{}@var{severity}, po_message_t@tie{}@var{message1}, const@tie{}char@tie{}*@var{filename1}, size_t@tie{}@var{lineno1}, size_t@tie{}@var{column1}, int@tie{}@var{multiline_p1}, const@tie{}char@tie{}*@var{message_text1}, po_message_t@tie{}@var{message2}, const@tie{}char@tie{}*@var{filename2}, size_t@tie{}@var{lineno2}, size_t@tie{}@var{column2}, int@tie{}@var{multiline_p2}, const@tie{}char@tie{}*@var{message_text2})
5438
5439This function is called to signal a problem of the given @var{severity}
5440that refers to two messages.  It @emph{must not return} if
5441@var{severity} is @code{PO_SEVERITY_FATAL_ERROR}.
5442
5443It is similar to two calls to xerror.  If possible, an ellipsis can be
5444appended to @var{message_text1} and prepended to @var{message_text2}.
5445@end deftypefun
5446
5447@node po_file_t API
5448@subsection po_file_t API
5449
5450@deftp {Data Type} po_file_t
5451This is a pointer type that refers to the contents of a PO file, after it has
5452been read into memory.
5453@end deftp
5454
5455@deftypefun po_file_t po_file_create ()
5456The @code{po_file_create} function creates an empty PO file representation in
5457memory.
5458@end deftypefun
5459
5460@deftypefun po_file_t po_file_read (const@tie{}char@tie{}*@var{filename}, struct@tie{}po_xerror_handler@tie{}*@var{handler})
5461The @code{po_file_read} function reads a PO file into memory.  The file name
5462is given as argument.  The return value is a handle to the PO file's contents,
5463valid until @code{po_file_free} is called on it.  In case of error, the
5464functions from @var{handler} are called to signal it.
5465
5466This function is exported as @samp{po_file_read_v3} at ABI level, but is
5467defined as @code{po_file_read} in C code after the inclusion of
5468@samp{<gettext-po.h>}.
5469@end deftypefun
5470
5471@deftypefun po_file_t po_file_write (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{filename}, struct@tie{}po_xerror_handler@tie{}*@var{handler})
5472The @code{po_file_write} function writes the contents of the memory
5473structure @var{file} the @var{filename} given.  The return value is
5474@var{file} after a successful operation.  In case of error, the
5475functions from @var{handler} are called to signal it.
5476
5477This function is exported as @samp{po_file_write_v2} at ABI level, but
5478is defined as @code{po_file_write} in C code after the inclusion of
5479@samp{<gettext-po.h>}.
5480@end deftypefun
5481
5482@deftypefun void po_file_free (po_file_t@tie{}@var{file})
5483The @code{po_file_free} function frees a PO file's contents from memory,
5484including all messages that are only implicitly accessible through iterators.
5485@end deftypefun
5486
5487@deftypefun {const char * const *} po_file_domains (po_file_t@tie{}@var{file})
5488The @code{po_file_domains} function returns the domains for which the given
5489PO file has messages.  The return value is a @code{NULL} terminated array
5490which is valid as long as the @var{file} handle is valid.  For PO files which
5491contain no @samp{domain} directive, the return value contains only one domain,
5492namely the default domain @code{"messages"}.
5493@end deftypefun
5494
5495@node po_message_iterator_t API
5496@subsection po_message_iterator_t API
5497
5498@deftp {Data Type} po_message_iterator_t
5499This is a pointer type that refers to an iterator that produces a sequence of
5500messages.
5501@end deftp
5502
5503@deftypefun po_message_iterator_t po_message_iterator (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{domain})
5504The @code{po_message_iterator} returns an iterator that will produce the
5505messages of @var{file} that belong to the given @var{domain}.  If @var{domain}
5506is @code{NULL}, the default domain is used instead.  To list the messages,
5507use the function @code{po_next_message} repeatedly.
5508@end deftypefun
5509
5510@deftypefun void po_message_iterator_free (po_message_iterator_t@tie{}@var{iterator})
5511The @code{po_message_iterator_free} function frees an iterator previously
5512allocated through the @code{po_message_iterator} function.
5513@end deftypefun
5514
5515@deftypefun po_message_t po_next_message (po_message_iterator_t@tie{}@var{iterator})
5516The @code{po_next_message} function returns the next message from
5517@var{iterator} and advances the iterator.  It returns @code{NULL} when the
5518iterator has reached the end of its message list.
5519@end deftypefun
5520
5521@node po_message_t API
5522@subsection po_message_t API
5523
5524@deftp {Data Type} po_message_t
5525This is a pointer type that refers to a message of a PO file, including its
5526translation.
5527@end deftp
5528
5529@deftypefun {po_message_t} po_message_create (void)
5530Returns a freshly constructed message.  To finish initializing the
5531message, you must set the @code{msgid} and @code{msgstr}.  It @emph{must} be
5532inserted into a file to manage its memory, as there is no
5533@code{po_message_free} available to the user of the library.
5534@end deftypefun
5535
5536The following functions access details of a @code{po_message_t}.  Recall
5537that the results are valid as long as the @var{file} handle is valid.
5538
5539@deftypefun {const char *} po_message_msgctxt (po_message_t@tie{}@var{message})
5540The @code{po_message_msgctxt} function returns the @code{msgctxt}, the
5541context of @var{message}.  Returns @code{NULL} for a message not restricted
5542to a context.
5543@end deftypefun
5544
5545@deftypefun {void} po_message_set_msgctxt (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgctxt})
5546The @code{po_message_set_msgctxt} function changes the @code{msgctxt},
5547the context of the message, to the value provided through
5548@var{msgctxt}.  The value @code{NULL} removes the restriction.
5549@end deftypefun
5550
5551@deftypefun {const char *} po_message_msgid (po_message_t@tie{}@var{message})
5552The @code{po_message_msgid} function returns the @code{msgid} (untranslated
5553English string) of @var{message}.  This is guaranteed to be non-@code{NULL}.
5554@end deftypefun
5555
5556@deftypefun {void} po_message_set_msgid (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgid})
5557The @code{po_message_set_msgid} function changes the @code{msgid}
5558(untranslated English string) of @var{message} to the value provided through
5559@var{msgid}, a non-@code{NULL} string.
5560@end deftypefun
5561
5562@deftypefun {const char *} po_message_msgid_plural (po_message_t@tie{}@var{message})
5563The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
5564(untranslated English plural string) of @var{message}, a message with plurals,
5565or @code{NULL} for a message without plural.
5566@end deftypefun
5567
5568@deftypefun {void} po_message_set_msgid_plural (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgid_plural})
5569The @code{po_message_set_msgid_plural} function changes the
5570@code{msgid_plural} (untranslated English plural string) of a message to
5571the value provided through @var{msgid_plural}, or removes the plurals if
5572@code{NULL} is provided as @var{msgid_plural}.
5573@end deftypefun
5574
5575@deftypefun {const char *} po_message_msgstr (po_message_t@tie{}@var{message})
5576The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
5577of @var{message}.  For an untranslated message, the return value is an empty
5578string.
5579@end deftypefun
5580
5581@deftypefun {void} po_message_set_msgstr (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgstr})
5582The @code{po_message_set_msgstr} function changes the @code{msgstr}
5583(translation) of @var{message} to the value provided through @var{msgstr}, a
5584non-@code{NULL} string.
5585@end deftypefun
5586
5587@deftypefun {const char *} po_message_msgstr_plural (po_message_t@tie{}@var{message}, int@tie{}@var{index})
5588The @code{po_message_msgstr_plural} function returns the
5589@code{msgstr[@var{index}]} of @var{message}, a message with plurals, or
5590@code{NULL} when the @var{index} is out of range or for a message without
5591plural.
5592@end deftypefun
5593
5594@deftypefun {void} po_message_set_msgstr_plural (po_message_t@tie{}@var{message}, int@tie{}@var{index}, const@tie{}char@tie{}*@var{msgstr_plural})
5595The @code{po_message_set_msgstr_plural} function changes the
5596@code{msgstr[@var{index}]} of @var{message}, a message with plurals, to
5597the value provided through @var{msgstr_plural}. @var{message} must be a
5598message with plurals.
5599Use @code{NULL} as the value of @var{msgstr_plural} with
5600@var{index} pointing to the last element to reduce the number of plural
5601forms.
5602@end deftypefun
5603
5604@deftypefun {const char *} po_message_comments (po_message_t@tie{}@var{message})
5605The @code{po_message_comments} function returns the comments of @var{message},
5606a multiline string, ending in a newline, or a non-@code{NULL} empty string.
5607@end deftypefun
5608
5609@deftypefun {void} po_message_set_comments (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{comments})
5610The @code{po_message_set_comments} function changes the comments of
5611@var{message} to the value @var{comments}, a multiline string, ending in a
5612newline, or a non-@code{NULL} empty string.
5613@end deftypefun
5614
5615@deftypefun {const char *} po_message_extracted_comments (po_message_t@tie{}@var{message})
5616The @code{po_message_extracted_comments} function returns the extracted
5617comments of @var{message}, a multiline string, ending in a newline, or a
5618non-@code{NULL} empty string.
5619@end deftypefun
5620
5621@deftypefun {void} po_message_set_extracted_comments (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{extracted_comments})
5622The @code{po_message_set_extracted_comments} function changes the
5623comments of @var{message} to the value @var{extracted_comments}, a multiline
5624string, ending in a newline, or a non-@code{NULL} empty string.
5625@end deftypefun
5626
5627@deftypefun {const char *} po_message_prev_msgctxt (po_message_t@tie{}@var{message})
5628The @code{po_message_prev_msgctxt} function returns the previous
5629@code{msgctxt}, the previous context of @var{message}.  Return
5630@code{NULL} for a message that does not have a previous context.
5631@end deftypefun
5632
5633@deftypefun {void} po_message_set_prev_msgctxt (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgctxt})
5634The @code{po_message_set_prev_msgctxt} function changes the previous
5635@code{msgctxt}, the context of the message, to the value provided
5636through @var{prev_msgctxt}.  The value @code{NULL} removes the stored
5637previous msgctxt.
5638@end deftypefun
5639
5640@deftypefun {const char *} po_message_prev_msgid (po_message_t@tie{}@var{message})
5641The @code{po_message_prev_msgid} function returns the previous
5642@code{msgid} (untranslated English string) of @var{message}, or
5643@code{NULL} if there is no previous @code{msgid} stored.
5644@end deftypefun
5645
5646@deftypefun {void} po_message_set_prev_msgid (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgid})
5647The @code{po_message_set_prev_msgid} function changes the previous
5648@code{msgid} (untranslated English string) of @var{message} to the value
5649provided through @var{prev_msgid}, or removes the message when it is
5650@code{NULL}.
5651@end deftypefun
5652
5653@deftypefun {const char *} po_message_prev_msgid_plural (po_message_t@tie{}@var{message})
5654The @code{po_message_prev_msgid_plural} function returns the previous
5655@code{msgid_plural} (untranslated English plural string) of
5656@var{message}, a message with plurals, or @code{NULL} for a message
5657without plural without any stored previous @code{msgid_plural}.
5658@end deftypefun
5659
5660@deftypefun {void} po_message_set_prev_msgid_plural (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgid_plural})
5661The @code{po_message_set_prev_msgid_plural} function changes the
5662previous @code{msgid_plural} (untranslated English plural string) of a
5663message to the value provided through @var{prev_msgid_plural}, or
5664removes the stored previous @code{msgid_plural} if @code{NULL} is
5665provided as @var{prev_msgid_plural}.
5666@end deftypefun
5667
5668@deftypefun {int} po_message_is_obsolete (po_message_t@tie{}@var{message})
5669The @code{po_message_is_obsolete} function returns true when @var{message}
5670is marked as obsolete.
5671@end deftypefun
5672
5673@deftypefun {void} po_message_set_obsolete (po_message_t@tie{}@var{message}, int@tie{}@var{obsolete})
5674The @code{po_message_set_obsolete} function changes the obsolete mark of
5675@var{message}.
5676@end deftypefun
5677
5678@deftypefun {int} po_message_is_fuzzy (po_message_t@tie{}@var{message})
5679The @code{po_message_is_fuzzy} function returns true when @var{message}
5680is marked as fuzzy.
5681@end deftypefun
5682
5683@deftypefun {void} po_message_set_fuzzy (po_message_t@tie{}@var{message}, int@tie{}@var{fuzzy})
5684The @code{po_message_set_fuzzy} function changes the fuzzy mark of
5685@var{message}.
5686@end deftypefun
5687
5688@deftypefun {int} po_message_is_format (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{format_type})
5689The @code{po_message_is_format} function returns true when the message
5690is marked as being a format string of @var{format_type}.
5691@end deftypefun
5692
5693@deftypefun {void} po_message_set_format (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{format_type}, int@tie{}@var{value})
5694The @code{po_message_set_fuzzy} function changes the format mark of
5695the message for the @var{format_type} provided.
5696@end deftypefun
5697
5698@deftypefun {int} po_message_is_range (po_message_t@tie{}@var{message}, int@tie{}*@var{minp}, int@tie{}*@var{maxp})
5699The @code{po_message_is_range} function returns true when the message
5700has a numeric range set, and stores the minimum and maximum value in the
5701locations pointed by @var{minp} and @var{maxp} respectively.
5702@end deftypefun
5703
5704@deftypefun {void} po_message_set_range (po_message_t@tie{}@var{message}, int@tie{}@var{min}, int@tie{}@var{max})
5705The @code{po_message_set_range} function changes the numeric range of
5706the message. @var{min} and @var{max} must be non-negative, with
5707@var{min} < @var{max}.  Use @var{min} and @var{max} with value @code{-1}
5708to remove the numeric range of @var{message}.
5709@end deftypefun
5710
5711@node PO Header Entry API
5712@subsection PO Header Entry API
5713
5714The following functions provide an interface to extract and manipulate
5715the header entry (@pxref{Header Entry}) from a file loaded in memory.
5716The meta information must be written back into the domain message with
5717the empty string as @code{msgid}.
5718
5719@deftypefun {const char *} po_file_domain_header (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{domain})
5720Returns the header entry of a domain from @var{file}, a PO file loaded in
5721memory.  The value @code{NULL} provided as @var{domain} denotes the
5722default domain.  Returns @code{NULL} if there is no header entry.
5723@end deftypefun
5724
5725@deftypefun {char *} po_header_field (const@tie{}char@tie{}*@var{header}, const@tie{}char@tie{}*@var{field})
5726Returns the value of @var{field} in the @var{header} entry.  The return
5727value is either a freshly allocated string, to be freed by the caller,
5728or @code{NULL}.
5729@end deftypefun
5730
5731@deftypefun {char *} po_header_set_field (const@tie{}char@tie{}*@var{header}, const@tie{}char@tie{}*@var{field}, const@tie{}char@tie{}*@var{value})
5732Returns a freshly allocated string which contains the entry from
5733@var{header} with @var{field} set to @var{value}.  The field is added if
5734necessary.
5735@end deftypefun
5736
5737@node po_filepos_t API
5738@subsection po_filepos_t API
5739
5740@deftp {Data Type} po_filepos_t
5741This is a pointer type that refers to a string's position within a
5742source file.
5743@end deftp
5744
5745The following functions provide an interface to extract and manipulate
5746these references.
5747
5748@deftypefun {po_filepos_t} po_message_filepos (po_message_t@tie{}@var{message}, int@tie{}@var{index})
5749Returns the file reference in position @var{index} from the message.  If
5750@var{index} is out of range, returns @code{NULL}.
5751@end deftypefun
5752
5753@deftypefun {void} po_message_remove_filepos (po_message_t@tie{}@var{message}, int@tie{}@var{index})
5754Removes the file reference in position @var{index} from the message.  It
5755moves all references following @var{index} one position backwards.
5756@end deftypefun
5757
5758@deftypefun {void} po_message_add_filepos (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{file}, size_t@tie{}@var{start_line})
5759Adds a reference to the string from @var{file} starting at
5760@var{start_line}, if it is not already present for the message.  The
5761value @code{(size_t)(-1)} for @var{start_line} denotes that the line
5762number is not available.
5763@end deftypefun
5764
5765@node Format Type API
5766@subsection Format Type API
5767
5768@deftypefun {const char * const *} po_format_list (void)
5769Returns a @code{NULL} terminated array of the supported format types.
5770@end deftypefun
5771
5772@deftypefun {const char *} po_format_pretty_name (const@tie{}char@tie{}*@var{format_type})
5773Returns the pretty name associated with @var{format_type}.  For example,
5774it returns ``C#'' when @var{format_type} is ``csharp_format''.
5775Return @code{NULL} if @var{format_type} is not a supported format type.
5776@end deftypefun
5777
5778@node Checking API
5779@subsection Checking API
5780
5781@deftypefun {void} po_file_check_all (po_file_t@tie{}@var{file}, po_xerror_handler_t@tie{}@var{handler})
5782Tests whether the entire @var{file} is valid, like @code{msgfmt} does it.  If it
5783is invalid, passes the reasons to @var{handler}.
5784@end deftypefun
5785
5786@deftypefun {void} po_message_check_all (po_message_t@tie{}@var{message}, po_message_iterator_t@tie{}@var{iterator}, po_xerror_handler_t@tie{}@var{handler})
5787Tests @var{message}, to be inserted at @var{iterator} in a PO file in memory,
5788like @code{msgfmt} does it.  If it is invalid, passes the reasons to
5789@var{handler}.  @var{iterator} is not modified by this call; it only
5790specifies the file and the domain.
5791@end deftypefun
5792
5793@deftypefun {void} po_message_check_format (po_message_t@tie{}@var{message}, po_xerror_handler_t@tie{}@var{handler})
5794Tests whether the message translation from @var{message} is a valid
5795format string if the message is marked as being a format string.  If it
5796is invalid, passes the reasons to @var{handler}.
5797
5798This function is exported as @samp{po_message_check_format_v2} at ABI
5799level, but is defined as @code{po_message_check_format} in C code after
5800the inclusion of @samp{<gettext-po.h>}.
5801@end deftypefun
5802
5803@node Binaries
5804@chapter Producing Binary MO Files
5805
5806@c FIXME: Rewrite.
5807
5808@menu
5809* msgfmt Invocation::           Invoking the @code{msgfmt} Program
5810* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
5811* MO Files::                    The Format of GNU MO Files
5812@end menu
5813
5814@node msgfmt Invocation
5815@section Invoking the @code{msgfmt} Program
5816
5817@include msgfmt.texi
5818
5819@node msgunfmt Invocation
5820@section Invoking the @code{msgunfmt} Program
5821
5822@include msgunfmt.texi
5823
5824@node MO Files
5825@section The Format of GNU MO Files
5826@cindex MO file's format
5827@cindex file format, @file{.mo}
5828
5829The format of the generated MO files is best described by a picture,
5830which appears below.
5831
5832@cindex magic signature of MO files
5833The first two words serve the identification of the file.  The magic
5834number will always signal GNU MO files.  The number is stored in the
5835byte order used when the MO file was generated, so the magic number
5836really is two numbers: @code{0x950412de} and @code{0xde120495}.
5837
5838The second word describes the current revision of the file format,
5839composed of a major and a minor revision number.  The revision numbers
5840ensure that the readers of MO files can distinguish new formats from
5841old ones and handle their contents, as far as possible.  For now the
5842major revision is 0 or 1, and the minor revision is also 0 or 1.  More
5843revisions might be added in the future.  A program seeing an unexpected
5844major revision number should stop reading the MO file entirely; whereas
5845an unexpected minor revision number means that the file can be read but
5846will not reveal its full contents, when parsed by a program that
5847supports only smaller minor revision numbers.
5848
5849The version is kept
5850separate from the magic number, instead of using different magic
5851numbers for different formats, mainly because @file{/etc/magic} is
5852not updated often.
5853
5854Follow a number of pointers to later tables in the file, allowing
5855for the extension of the prefix part of MO files without having to
5856recompile programs reading them.  This might become useful for later
5857inserting a few flag bits, indication about the charset used, new
5858tables, or other things.
5859
5860Then, at offset @var{O} and offset @var{T} in the picture, two tables
5861of string descriptors can be found.  In both tables, each string
5862descriptor uses two 32 bits integers, one for the string length,
5863another for the offset of the string in the MO file, counting in bytes
5864from the start of the file.  The first table contains descriptors
5865for the original strings, and is sorted so the original strings
5866are in increasing lexicographical order.  The second table contains
5867descriptors for the translated strings, and is parallel to the first
5868table: to find the corresponding translation one has to access the
5869array slot in the second array with the same index.
5870
5871Having the original strings sorted enables the use of simple binary
5872search, for when the MO file does not contain an hashing table, or
5873for when it is not practical to use the hashing table provided in
5874the MO file.  This also has another advantage, as the empty string
5875in a PO file GNU @code{gettext} is usually @emph{translated} into
5876some system information attached to that particular MO file, and the
5877empty string necessarily becomes the first in both the original and
5878translated tables, making the system information very easy to find.
5879
5880@cindex hash table, inside MO files
5881The size @var{S} of the hash table can be zero.  In this case, the
5882hash table itself is not contained in the MO file.  Some people might
5883prefer this because a precomputed hashing table takes disk space, and
5884does not win @emph{that} much speed.  The hash table contains indices
5885to the sorted array of strings in the MO file.  Conflict resolution is
5886done by double hashing.  The precise hashing algorithm used is fairly
5887dependent on GNU @code{gettext} code, and is not documented here.
5888
5889As for the strings themselves, they follow the hash file, and each
5890is terminated with a @key{NUL}, and this @key{NUL} is not counted in
5891the length which appears in the string descriptor.  The @code{msgfmt}
5892program has an option selecting the alignment for MO file strings.
5893With this option, each string is separately aligned so it starts at
5894an offset which is a multiple of the alignment value.  On some RISC
5895machines, a correct alignment will speed things up.
5896
5897@cindex context, in MO files
5898Contexts are stored by storing the concatenation of the context, a
5899@key{EOT} byte, and the original string, instead of the original string.
5900
5901@cindex plural forms, in MO files
5902Plural forms are stored by letting the plural of the original string
5903follow the singular of the original string, separated through a
5904@key{NUL} byte.  The length which appears in the string descriptor
5905includes both.  However, only the singular of the original string
5906takes part in the hash table lookup.  The plural variants of the
5907translation are all stored consecutively, separated through a
5908@key{NUL} byte.  Here also, the length in the string descriptor
5909includes all of them.
5910
5911Nothing prevents a MO file from having embedded @key{NUL}s in strings.
5912However, the program interface currently used already presumes
5913that strings are @key{NUL} terminated, so embedded @key{NUL}s are
5914somewhat useless.  But the MO file format is general enough so other
5915interfaces would be later possible, if for example, we ever want to
5916implement wide characters right in MO files, where @key{NUL} bytes may
5917accidentally appear.  (No, we don't want to have wide characters in MO
5918files.  They would make the file unnecessarily large, and the
5919@samp{wchar_t} type being platform dependent, MO files would be
5920platform dependent as well.)
5921
5922This particular issue has been strongly debated in the GNU
5923@code{gettext} development forum, and it is expectable that MO file
5924format will evolve or change over time.  It is even possible that many
5925formats may later be supported concurrently.  But surely, we have to
5926start somewhere, and the MO file format described here is a good start.
5927Nothing is cast in concrete, and the format may later evolve fairly
5928easily, so we should feel comfortable with the current approach.
5929
5930@example
5931@group
5932        byte
5933             +------------------------------------------+
5934          0  | magic number = 0x950412de                |
5935             |                                          |
5936          4  | file format revision = 0                 |
5937             |                                          |
5938          8  | number of strings                        |  == N
5939             |                                          |
5940         12  | offset of table with original strings    |  == O
5941             |                                          |
5942         16  | offset of table with translation strings |  == T
5943             |                                          |
5944         20  | size of hashing table                    |  == S
5945             |                                          |
5946         24  | offset of hashing table                  |  == H
5947             |                                          |
5948             .                                          .
5949             .    (possibly more entries later)         .
5950             .                                          .
5951             |                                          |
5952          O  | length & offset 0th string  ----------------.
5953      O + 8  | length & offset 1st string  ------------------.
5954              ...                                    ...   | |
5955O + ((N-1)*8)| length & offset (N-1)th string           |  | |
5956             |                                          |  | |
5957          T  | length & offset 0th translation  ---------------.
5958      T + 8  | length & offset 1st translation  -----------------.
5959              ...                                    ...   | | | |
5960T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
5961             |                                          |  | | | |
5962          H  | start hash table                         |  | | | |
5963              ...                                    ...   | | | |
5964  H + S * 4  | end hash table                           |  | | | |
5965             |                                          |  | | | |
5966             | NUL terminated 0th string  <----------------' | | |
5967             |                                          |    | | |
5968             | NUL terminated 1st string  <------------------' | |
5969             |                                          |      | |
5970              ...                                    ...       | |
5971             |                                          |      | |
5972             | NUL terminated 0th translation  <---------------' |
5973             |                                          |        |
5974             | NUL terminated 1st translation  <-----------------'
5975             |                                          |
5976              ...                                    ...
5977             |                                          |
5978             +------------------------------------------+
5979@end group
5980@end example
5981
5982@node Programmers
5983@chapter The Programmer's View
5984
5985@c FIXME: Reorganize whole chapter.
5986
5987One aim of the current message catalog implementation provided by
5988GNU @code{gettext} was to use the system's message catalog handling, if the
5989installer wishes to do so.  So we perhaps should first take a look at
5990the solutions we know about.  The people in the POSIX committee did not
5991manage to agree on one of the semi-official standards which we'll
5992describe below.  In fact they couldn't agree on anything, so they decided
5993only to include an example of an interface.  The major Unix vendors
5994are split in the usage of the two most important specifications: X/Open's
5995catgets vs. Uniforum's gettext interface.  We'll describe them both and
5996later explain our solution of this dilemma.
5997
5998@menu
5999* catgets::                     About @code{catgets}
6000* gettext::                     About @code{gettext}
6001* Comparison::                  Comparing the two interfaces
6002* Using libintl.a::             Using libintl.a in own programs
6003* gettext grok::                Being a @code{gettext} grok
6004* Temp Programmers::            Temporary Notes for the Programmers Chapter
6005@end menu
6006
6007@node catgets
6008@section About @code{catgets}
6009@cindex @code{catgets}, X/Open specification
6010
6011The @code{catgets} implementation is defined in the X/Open Portability
6012Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
6013process of creating this standard seemed to be too slow for some of
6014the Unix vendors so they created their implementations on preliminary
6015versions of the standard.  Of course this leads again to problems while
6016writing platform independent programs: even the usage of @code{catgets}
6017does not guarantee a unique interface.
6018
6019Another, personal comment on this that only a bunch of committee members
6020could have made this interface.  They never really tried to program
6021using this interface.  It is a fast, memory-saving implementation, an
6022user can happily live with it.  But programmers hate it (at least I and
6023some others do@dots{})
6024
6025But we must not forget one point: after all the trouble with transferring
6026the rights on Unix they at last came to X/Open, the very same who
6027published this specification.  This leads me to making the prediction
6028that this interface will be in future Unix standards (e.g.@: Spec1170) and
6029therefore part of all Unix implementation (implementations, which are
6030@emph{allowed} to wear this name).
6031
6032@menu
6033* Interface to catgets::        The interface
6034* Problems with catgets::       Problems with the @code{catgets} interface?!
6035@end menu
6036
6037@node Interface to catgets
6038@subsection The Interface
6039@cindex interface to @code{catgets}
6040
6041The interface to the @code{catgets} implementation consists of three
6042functions which correspond to those used in file access: @code{catopen}
6043to open the catalog for using, @code{catgets} for accessing the message
6044tables, and @code{catclose} for closing after work is done.  Prototypes
6045for the functions and the needed definitions are in the
6046@code{<nl_types.h>} header file.
6047
6048@cindex @code{catopen}, a @code{catgets} function
6049@code{catopen} is used like in this:
6050
6051@example
6052nl_catd catd = catopen ("catalog_name", 0);
6053@end example
6054
6055The function takes as the argument the name of the catalog.  This usual
6056refers to the name of the program or the package.  The second parameter
6057is not further specified in the standard.  I don't even know whether it
6058is implemented consistently among various systems.  So the common advice
6059is to use @code{0} as the value.  The return value is a handle to the
6060message catalog, equivalent to handles to file returned by @code{open}.
6061
6062@cindex @code{catgets}, a @code{catgets} function
6063This handle is of course used in the @code{catgets} function which can
6064be used like this:
6065
6066@example
6067char *translation = catgets (catd, set_no, msg_id, "original string");
6068@end example
6069
6070The first parameter is this catalog descriptor.  The second parameter
6071specifies the set of messages in this catalog, in which the message
6072described by @code{msg_id} is obtained.  @code{catgets} therefore uses a
6073three-stage addressing:
6074
6075@display
6076catalog name @result{} set number @result{} message ID @result{} translation
6077@end display
6078
6079@c Anybody else loving Haskell??? :-) -- Uli
6080
6081The fourth argument is not used to address the translation.  It is given
6082as a default value in case when one of the addressing stages fail.  One
6083important thing to remember is that although the return type of catgets
6084is @code{char *} the resulting string @emph{must not} be changed.  It
6085should better be @code{const char *}, but the standard is published in
60861988, one year before ANSI C.
6087
6088@noindent
6089@cindex @code{catclose}, a @code{catgets} function
6090The last of these functions is used and behaves as expected:
6091
6092@example
6093catclose (catd);
6094@end example
6095
6096After this no @code{catgets} call using the descriptor is legal anymore.
6097
6098@node Problems with catgets
6099@subsection Problems with the @code{catgets} Interface?!
6100@cindex problems with @code{catgets} interface
6101
6102Now that this description seemed to be really easy --- where are the
6103problems we speak of?  In fact the interface could be used in a
6104reasonable way, but constructing the message catalogs is a pain.  The
6105reason for this lies in the third argument of @code{catgets}: the unique
6106message ID.  This has to be a numeric value for all messages in a single
6107set.  Perhaps you could imagine the problems keeping such a list while
6108changing the source code.  Add a new message here, remove one there.  Of
6109course there have been developed a lot of tools helping to organize this
6110chaos but one as the other fails in one aspect or the other.  We don't
6111want to say that the other approach has no problems but they are far
6112more easy to manage.
6113
6114@node gettext
6115@section About @code{gettext}
6116@cindex @code{gettext}, a programmer's view
6117
6118The definition of the @code{gettext} interface comes from a Uniforum
6119proposal.  It was submitted there by Sun, who had implemented the
6120@code{gettext} function in SunOS 4, around 1990.  Nowadays, the
6121@code{gettext} interface is specified by the OpenI18N standard.
6122
6123The main point about this solution is that it does not follow the
6124method of normal file handling (open-use-close) and that it does not
6125burden the programmer with so many tasks, especially the unique key handling.
6126Of course here also a unique key is needed, but this key is the message
6127itself (how long or short it is).  See @ref{Comparison} for a more
6128detailed comparison of the two methods.
6129
6130The following section contains a rather detailed description of the
6131interface.  We make it that detailed because this is the interface
6132we chose for the GNU @code{gettext} Library.  Programmers interested
6133in using this library will be interested in this description.
6134
6135@menu
6136* Interface to gettext::        The interface
6137* Ambiguities::                 Solving ambiguities
6138* Locating Catalogs::           Locating message catalog files
6139* Charset conversion::          How to request conversion to Unicode
6140* Contexts::                    Solving ambiguities in GUI programs
6141* Plural forms::                Additional functions for handling plurals
6142* Optimized gettext::           Optimization of the *gettext functions
6143@end menu
6144
6145@node Interface to gettext
6146@subsection The Interface
6147@cindex @code{gettext} interface
6148
6149The minimal functionality an interface must have is a) to select a
6150domain the strings are coming from (a single domain for all programs is
6151not reasonable because its construction and maintenance is difficult,
6152perhaps impossible) and b) to access a string in a selected domain.
6153
6154This is principally the description of the @code{gettext} interface.  It
6155has a global domain which unqualified usages reference.  Of course this
6156domain is selectable by the user.
6157
6158@example
6159char *textdomain (const char *domain_name);
6160@end example
6161
6162This provides the possibility to change or query the current status of
6163the current global domain of the @code{LC_MESSAGE} category.  The
6164argument is a null-terminated string, whose characters must be legal in
6165the use in filenames.  If the @var{domain_name} argument is @code{NULL},
6166the function returns the current value.  If no value has been set
6167before, the name of the default domain is returned: @emph{messages}.
6168Please note that although the return value of @code{textdomain} is of
6169type @code{char *} no changing is allowed.  It is also important to know
6170that no checks of the availability are made.  If the name is not
6171available you will see this by the fact that no translations are provided.
6172
6173@noindent
6174To use a domain set by @code{textdomain} the function
6175
6176@example
6177char *gettext (const char *msgid);
6178@end example
6179
6180@noindent
6181is to be used.  This is the simplest reasonable form one can imagine.
6182The translation of the string @var{msgid} is returned if it is available
6183in the current domain.  If it is not available, the argument itself is
6184returned.  If the argument is @code{NULL} the result is undefined.
6185
6186One thing which should come into mind is that no explicit dependency to
6187the used domain is given.  The current value of the domain is used.
6188If this changes between two
6189executions of the same @code{gettext} call in the program, both calls
6190reference a different message catalog.
6191
6192For the easiest case, which is normally used in internationalized
6193packages, once at the beginning of execution a call to @code{textdomain}
6194is issued, setting the domain to a unique name, normally the package
6195name.  In the following code all strings which have to be translated are
6196filtered through the gettext function.  That's all, the package speaks
6197your language.
6198
6199@node Ambiguities
6200@subsection Solving Ambiguities
6201@cindex several domains
6202@cindex domain ambiguities
6203@cindex large package
6204
6205While this single name domain works well for most applications there
6206might be the need to get translations from more than one domain.  Of
6207course one could switch between different domains with calls to
6208@code{textdomain}, but this is really not convenient nor is it fast.  A
6209possible situation could be one case subject to discussion during this
6210writing:  all
6211error messages of functions in the set of common used functions should
6212go into a separate domain @code{error}.  By this mean we would only need
6213to translate them once.
6214Another case are messages from a library, as these @emph{have} to be
6215independent of the current domain set by the application.
6216
6217@noindent
6218For this reasons there are two more functions to retrieve strings:
6219
6220@example
6221char *dgettext (const char *domain_name, const char *msgid);
6222char *dcgettext (const char *domain_name, const char *msgid,
6223                 int category);
6224@end example
6225
6226Both take an additional argument at the first place, which corresponds
6227to the argument of @code{textdomain}.  The third argument of
6228@code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}.
6229But I really don't know where this can be useful.  If the
6230@var{domain_name} is @code{NULL} or @var{category} has an value beside
6231the known ones, the result is undefined.  It should also be noted that
6232this function is not part of the second known implementation of this
6233function family, the one found in Solaris.
6234
6235A second ambiguity can arise by the fact, that perhaps more than one
6236domain has the same name.  This can be solved by specifying where the
6237needed message catalog files can be found.
6238
6239@example
6240char *bindtextdomain (const char *domain_name,
6241                      const char *dir_name);
6242@end example
6243
6244Calling this function binds the given domain to a file in the specified
6245directory (how this file is determined follows below).  Especially a
6246file in the systems default place is not favored against the specified
6247file anymore (as it would be by solely using @code{textdomain}).  A
6248@code{NULL} pointer for the @var{dir_name} parameter returns the binding
6249associated with @var{domain_name}.  If @var{domain_name} itself is
6250@code{NULL} nothing happens and a @code{NULL} pointer is returned.  Here
6251again as for all the other functions is true that none of the return
6252value must be changed!
6253
6254It is important to remember that relative path names for the
6255@var{dir_name} parameter can be trouble.  Since the path is always
6256computed relative to the current directory different results will be
6257achieved when the program executes a @code{chdir} command.  Relative
6258paths should always be avoided to avoid dependencies and
6259unreliabilities.
6260
6261@example
6262wchar_t *wbindtextdomain (const char *domain_name,
6263                          const wchar_t *dir_name);
6264@end example
6265
6266This function is provided only on native Windows platforms.  It is like
6267@code{bindtextdomain}, except that the @var{dir_name} parameter is a
6268wide string (in UTF-16 encoding, as usual on Windows).
6269
6270@node Locating Catalogs
6271@subsection Locating Message Catalog Files
6272@cindex message catalog files location
6273
6274Because many different languages for many different packages have to be
6275stored we need some way to add these information to file message catalog
6276files.  The way usually used in Unix environments is have this encoding
6277in the file name.  This is also done here.  The directory name given in
6278@code{bindtextdomain}s second argument (or the default directory),
6279followed by the name of the locale, the locale category, and the domain name
6280are concatenated:
6281
6282@example
6283@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
6284@end example
6285
6286The default value for @var{dir_name} is system specific.  For the GNU
6287library, and for packages adhering to its conventions, it's:
6288@example
6289/usr/local/share/locale
6290@end example
6291
6292@noindent
6293@var{locale} is the name of the locale category which is designated by
6294@code{LC_@var{category}}.  For @code{gettext} and @code{dgettext} this
6295@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
6296system, e.g.@: mingw, don't have @code{LC_MESSAGES}.  Here we use a more or
6297less arbitrary value for it, namely 1729, the smallest positive integer
6298which can be represented in two different ways as the sum of two cubes.}
6299The name of the locale category is determined through
6300@code{setlocale (LC_@var{category}, NULL)}.
6301@footnote{When the system does not support @code{setlocale} its behavior
6302in setting the locale values is simulated by looking at the environment
6303variables.}
6304When using the function @code{dcgettext}, you can specify the locale category
6305through the third argument.
6306
6307@node Charset conversion
6308@subsection How to specify the output character set @code{gettext} uses
6309@cindex charset conversion at runtime
6310@cindex encoding conversion at runtime
6311
6312@code{gettext} not only looks up a translation in a message catalog.  It
6313also converts the translation on the fly to the desired output character
6314set.  This is useful if the user is working in a different character set
6315than the translator who created the message catalog, because it avoids
6316distributing variants of message catalogs which differ only in the
6317character set.
6318
6319The output character set is, by default, the value of @code{nl_langinfo
6320(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
6321locale.  But programs which store strings in a locale independent way
6322(e.g.@: UTF-8) can request that @code{gettext} and related functions
6323return the translations in that encoding, by use of the
6324@code{bind_textdomain_codeset} function.
6325
6326Note that the @var{msgid} argument to @code{gettext} is not subject to
6327character set conversion.  Also, when @code{gettext} does not find a
6328translation for @var{msgid}, it returns @var{msgid} unchanged --
6329independently of the current output character set.  It is therefore
6330recommended that all @var{msgid}s be US-ASCII strings.
6331
6332@deftypefun {char *} bind_textdomain_codeset (const@tie{}char@tie{}*@var{domainname}, const@tie{}char@tie{}*@var{codeset})
6333The @code{bind_textdomain_codeset} function can be used to specify the
6334output character set for message catalogs for domain @var{domainname}.
6335The @var{codeset} argument must be a valid codeset name which can be used
6336for the @code{iconv_open} function, or a null pointer.
6337
6338If the @var{codeset} parameter is the null pointer,
6339@code{bind_textdomain_codeset} returns the currently selected codeset
6340for the domain with the name @var{domainname}.  It returns @code{NULL} if
6341no codeset has yet been selected.
6342
6343The @code{bind_textdomain_codeset} function can be used several times.
6344If used multiple times with the same @var{domainname} argument, the
6345later call overrides the settings made by the earlier one.
6346
6347The @code{bind_textdomain_codeset} function returns a pointer to a
6348string containing the name of the selected codeset.  The string is
6349allocated internally in the function and must not be changed by the
6350user.  If the system went out of core during the execution of
6351@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
6352global variable @var{errno} is set accordingly.
6353@end deftypefun
6354
6355@node Contexts
6356@subsection Using contexts for solving ambiguities
6357@cindex context
6358@cindex GUI programs
6359@cindex translating menu entries
6360@cindex menu entries
6361
6362One place where the @code{gettext} functions, if used normally, have big
6363problems is within programs with graphical user interfaces (GUIs).  The
6364problem is that many of the strings which have to be translated are very
6365short.  They have to appear in pull-down menus which restricts the
6366length.  But strings which are not containing entire sentences or at
6367least large fragments of a sentence may appear in more than one
6368situation in the program but might have different translations.  This is
6369especially true for the one-word strings which are frequently used in
6370GUI programs.
6371
6372As a consequence many people say that the @code{gettext} approach is
6373wrong and instead @code{catgets} should be used which indeed does not
6374have this problem.  But there is a very simple and powerful method to
6375handle this kind of problems with the @code{gettext} functions.
6376
6377Contexts can be added to strings to be translated.  A context dependent
6378translation lookup is when a translation for a given string is searched,
6379that is limited to a given context.  The translation for the same string
6380in a different context can be different.  The different translations of
6381the same string in different contexts can be stored in the in the same
6382MO file, and can be edited by the translator in the same PO file.
6383
6384The @file{gettext.h} include file contains the lookup macros for strings
6385with contexts.  They are implemented as thin macros and inline functions
6386over the functions from @code{<libintl.h>}.
6387
6388@findex pgettext
6389@example
6390const char *pgettext (const char *msgctxt, const char *msgid);
6391@end example
6392
6393In a call of this macro, @var{msgctxt} and @var{msgid} must be string
6394literals.  The macro returns the translation of @var{msgid}, restricted
6395to the context given by @var{msgctxt}.
6396
6397The @var{msgctxt} string is visible in the PO file to the translator.
6398You should try to make it somehow canonical and never changing.  Because
6399every time you change an @var{msgctxt}, the translator will have to review
6400the translation of @var{msgid}.
6401
6402Finding a canonical @var{msgctxt} string that doesn't change over time can
6403be hard.  But you shouldn't use the file name or class name containing the
6404@code{pgettext} call -- because it is a common development task to rename
6405a file or a class, and it shouldn't cause translator work.  Also you shouldn't
6406use a comment in the form of a complete English sentence as @var{msgctxt} --
6407because orthography or grammar changes are often applied to such sentences,
6408and again, it shouldn't force the translator to do a review.
6409
6410The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext}
6411fetches a particular translation of the @var{msgid}.
6412
6413@findex dpgettext
6414@findex dcpgettext
6415@example
6416const char *dpgettext (const char *domain_name,
6417                       const char *msgctxt, const char *msgid);
6418const char *dcpgettext (const char *domain_name,
6419                        const char *msgctxt, const char *msgid,
6420                        int category);
6421@end example
6422
6423These are generalizations of @code{pgettext}.  They behave similarly to
6424@code{dgettext} and @code{dcgettext}, respectively.  The @var{domain_name}
6425argument defines the translation domain.  The @var{category} argument
6426allows to use another locale category than @code{LC_MESSAGES}.
6427
6428As as example consider the following fictional situation.  A GUI program
6429has a menu bar with the following entries:
6430
6431@smallexample
6432+------------+------------+--------------------------------------+
6433| File       | Printer    |                                      |
6434+------------+------------+--------------------------------------+
6435| Open     | | Select   |
6436| New      | | Open     |
6437+----------+ | Connect  |
6438             +----------+
6439@end smallexample
6440
6441To have the strings @code{File}, @code{Printer}, @code{Open},
6442@code{New}, @code{Select}, and @code{Connect} translated there has to be
6443at some point in the code a call to a function of the @code{gettext}
6444family.  But in two places the string passed into the function would be
6445@code{Open}.  The translations might not be the same and therefore we
6446are in the dilemma described above.
6447
6448What distinguishes the two places is the menu path from the menu root to
6449the particular menu entries:
6450
6451@smallexample
6452Menu|File
6453Menu|Printer
6454Menu|File|Open
6455Menu|File|New
6456Menu|Printer|Select
6457Menu|Printer|Open
6458Menu|Printer|Connect
6459@end smallexample
6460
6461The context is thus the menu path without its last part.  So, the calls
6462look like this:
6463
6464@smallexample
6465pgettext ("Menu|", "File")
6466pgettext ("Menu|", "Printer")
6467pgettext ("Menu|File|", "Open")
6468pgettext ("Menu|File|", "New")
6469pgettext ("Menu|Printer|", "Select")
6470pgettext ("Menu|Printer|", "Open")
6471pgettext ("Menu|Printer|", "Connect")
6472@end smallexample
6473
6474Whether or not to use the @samp{|} character at the end of the context is a
6475matter of style.
6476
6477For more complex cases, where the @var{msgctxt} or @var{msgid} are not
6478string literals, more general macros are available:
6479
6480@findex pgettext_expr
6481@findex dpgettext_expr
6482@findex dcpgettext_expr
6483@example
6484const char *pgettext_expr (const char *msgctxt, const char *msgid);
6485const char *dpgettext_expr (const char *domain_name,
6486                            const char *msgctxt, const char *msgid);
6487const char *dcpgettext_expr (const char *domain_name,
6488                             const char *msgctxt, const char *msgid,
6489                             int category);
6490@end example
6491
6492Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions.
6493These macros are more general.  But in the case that both argument expressions
6494are string literals, the macros without the @samp{_expr} suffix are more
6495efficient.
6496
6497@node Plural forms
6498@subsection Additional functions for plural forms
6499@cindex plural forms
6500
6501The functions of the @code{gettext} family described so far (and all the
6502@code{catgets} functions as well) have one problem in the real world
6503which have been neglected completely in all existing approaches.  What
6504is meant here is the handling of plural forms.
6505
6506Looking through Unix source code before the time anybody thought about
6507internationalization (and, sadly, even afterwards) one can often find
6508code similar to the following:
6509
6510@smallexample
6511   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
6512@end smallexample
6513
6514@noindent
6515After the first complaints from people internationalizing the code people
6516either completely avoided formulations like this or used strings like
6517@code{"file(s)"}.  Both look unnatural and should be avoided.  First
6518tries to solve the problem correctly looked like this:
6519
6520@smallexample
6521   if (n == 1)
6522     printf ("%d file deleted", n);
6523   else
6524     printf ("%d files deleted", n);
6525@end smallexample
6526
6527But this does not solve the problem.  It helps languages where the
6528plural form of a noun is not simply constructed by adding an
6529@ifhtml
6530‘s’
6531@end ifhtml
6532@ifnothtml
6533`s'
6534@end ifnothtml
6535but that is all.  Once again people fell into the trap of believing the
6536rules their language is using are universal.  But the handling of plural
6537forms differs widely between the language families.  For example,
6538Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
6539
6540@quotation
6541In Polish we use e.g.@: plik (file) this way:
6542@example
65431 plik
65442,3,4 pliki
65455-21 pliko'w
654622-24 pliki
654725-31 pliko'w
6548@end example
6549and so on (o' means 8859-2 oacute which should be rather okreska,
6550similar to aogonek).
6551@end quotation
6552
6553There are two things which can differ between languages (and even inside
6554language families);
6555
6556@itemize @bullet
6557@item
6558The form how plural forms are built differs.  This is a problem with
6559languages which have many irregularities.  German, for instance, is a
6560drastic case.  Though English and German are part of the same language
6561family (Germanic), the almost regular forming of plural noun forms
6562(appending an
6563@ifhtml
6564‘s’)
6565@end ifhtml
6566@ifnothtml
6567`s')
6568@end ifnothtml
6569is hardly found in German.
6570
6571@item
6572The number of plural forms differ.  This is somewhat surprising for
6573those who only have experiences with Romanic and Germanic languages
6574since here the number is the same (there are two).
6575
6576But other language families have only one form or many forms.  More
6577information on this in an extra section.
6578@end itemize
6579
6580The consequence of this is that application writers should not try to
6581solve the problem in their code.  This would be localization since it is
6582only usable for certain, hardcoded language environments.  Instead the
6583extended @code{gettext} interface should be used.
6584
6585These extra functions are taking instead of the one key string two
6586strings and a numerical argument.  The idea behind this is that using
6587the numerical argument and the first string as a key, the implementation
6588can select using rules specified by the translator the right plural
6589form.  The two string arguments then will be used to provide a return
6590value in case no message catalog is found (similar to the normal
6591@code{gettext} behavior).  In this case the rules for Germanic language
6592is used and it is assumed that the first string argument is the singular
6593form, the second the plural form.
6594
6595This has the consequence that programs without language catalogs can
6596display the correct strings only if the program itself is written using
6597a Germanic language.  This is a limitation but since the GNU C library
6598(as well as the GNU @code{gettext} package) are written as part of the
6599GNU package and the coding standards for the GNU project require program
6600being written in English, this solution nevertheless fulfills its
6601purpose.
6602
6603@deftypefun {char *} ngettext (const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n})
6604The @code{ngettext} function is similar to the @code{gettext} function
6605as it finds the message catalogs in the same way.  But it takes two
6606extra arguments.  The @var{msgid1} parameter must contain the singular
6607form of the string to be converted.  It is also used as the key for the
6608search in the catalog.  The @var{msgid2} parameter is the plural form.
6609The parameter @var{n} is used to determine the plural form.  If no
6610message catalog is found @var{msgid1} is returned if @code{n == 1},
6611otherwise @code{msgid2}.
6612
6613An example for the use of this function is:
6614
6615@smallexample
6616printf (ngettext ("%d file removed", "%d files removed", n), n);
6617@end smallexample
6618
6619Please note that the numeric value @var{n} has to be passed to the
6620@code{printf} function as well.  It is not sufficient to pass it only to
6621@code{ngettext}.
6622
6623In the English singular case, the number -- always 1 -- can be replaced with
6624"one":
6625
6626@smallexample
6627printf (ngettext ("One file removed", "%d files removed", n), n);
6628@end smallexample
6629
6630@noindent
6631This works because the @samp{printf} function discards excess arguments that
6632are not consumed by the format string.
6633
6634If this function is meant to yield a format string that takes two or more
6635arguments, you can not use it like this:
6636
6637@smallexample
6638printf (ngettext ("%d file removed from directory %s",
6639                  "%d files removed from directory %s",
6640                  n),
6641        n, dir);
6642@end smallexample
6643
6644@noindent
6645because in many languages the translators want to replace the @samp{%d}
6646with an explicit word in the singular case, just like ``one'' in English,
6647and C format strings cannot consume the second argument but skip the first
6648argument.  Instead, you have to reorder the arguments so that @samp{n}
6649comes last:
6650
6651@smallexample
6652printf (ngettext ("%2$d file removed from directory %1$s",
6653                  "%2$d files removed from directory %1$s",
6654                  n),
6655        dir, n);
6656@end smallexample
6657
6658@noindent
6659See @ref{c-format} for details about this argument reordering syntax.
6660
6661When you know that the value of @code{n} is within a given range, you can
6662specify it as a comment directed to the @code{xgettext} tool.  This
6663information may help translators to use more adequate translations.  Like
6664this:
6665
6666@smallexample
6667if (days > 7 && days < 14)
6668  /* xgettext: range: 1..6 */
6669  printf (ngettext ("one week and one day", "one week and %d days",
6670                    days - 7),
6671          days - 7);
6672@end smallexample
6673
6674It is also possible to use this function when the strings don't contain a
6675cardinal number:
6676
6677@smallexample
6678puts (ngettext ("Delete the selected file?",
6679                "Delete the selected files?",
6680                n));
6681@end smallexample
6682
6683In this case the number @var{n} is only used to choose the plural form.
6684@end deftypefun
6685
6686@deftypefun {char *} dngettext (const@tie{}char@tie{}*@var{domain}, const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n})
6687The @code{dngettext} is similar to the @code{dgettext} function in the
6688way the message catalog is selected.  The difference is that it takes
6689two extra parameter to provide the correct plural form.  These two
6690parameters are handled in the same way @code{ngettext} handles them.
6691@end deftypefun
6692
6693@deftypefun {char *} dcngettext (const@tie{}char@tie{}*@var{domain}, const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n}, int@tie{}@var{category})
6694The @code{dcngettext} is similar to the @code{dcgettext} function in the
6695way the message catalog is selected.  The difference is that it takes
6696two extra parameter to provide the correct plural form.  These two
6697parameters are handled in the same way @code{ngettext} handles them.
6698@end deftypefun
6699
6700Now, how do these functions solve the problem of the plural forms?
6701Without the input of linguists (which was not available) it was not
6702possible to determine whether there are only a few different forms in
6703which plural forms are formed or whether the number can increase with
6704every new supported language.
6705
6706Therefore the solution implemented is to allow the translator to specify
6707the rules of how to select the plural form.  Since the formula varies
6708with every language this is the only viable solution except for
6709hardcoding the information in the code (which still would require the
6710possibility of extensions to not prevent the use of new languages).
6711
6712@cindex specifying plural form in a PO file
6713@kwindex nplurals@r{, in a PO file header}
6714@kwindex plural@r{, in a PO file header}
6715The information about the plural form selection has to be stored in the
6716header entry of the PO file (the one with the empty @code{msgid} string).
6717The plural form information looks like this:
6718
6719@smallexample
6720Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
6721@end smallexample
6722
6723The @code{nplurals} value must be a decimal number which specifies how
6724many different plural forms exist for this language.  The string
6725following @code{plural} is an expression which is using the C language
6726syntax.  Exceptions are that no negative numbers are allowed, numbers
6727must be decimal, and the only variable allowed is @code{n}.  Spaces are
6728allowed in the expression, but backslash-newlines are not; in the
6729examples below the backslash-newlines are present for formatting purposes
6730only.  This expression will be evaluated whenever one of the functions
6731@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
6732numeric value passed to these functions is then substituted for all uses
6733of the variable @code{n} in the expression.  The resulting value then
6734must be greater or equal to zero and smaller than the value given as the
6735value of @code{nplurals}.
6736
6737@noindent
6738@cindex plural form formulas
6739The following rules are known at this point.  The language with families
6740are listed.  But this does not necessarily mean the information can be
6741generalized for the whole family (as can be easily seen in the table
6742below).@footnote{Additions are welcome.  Send appropriate information to
6743@email{bug-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.
6744The Unicode CLDR Project (@uref{http://cldr.unicode.org}) provides a
6745comprehensive set of plural forms in a different format.  The
6746@code{msginit} program has preliminary support for the format so you can
6747use it as a baseline (@pxref{msginit Invocation}).}
6748
6749@table @asis
6750@item Only one form:
6751Some languages only require one single form.  There is no distinction
6752between the singular and plural form.  An appropriate header entry
6753would look like this:
6754
6755@smallexample
6756Plural-Forms: nplurals=1; plural=0;
6757@end smallexample
6758
6759@noindent
6760Languages with this property include:
6761
6762@table @asis
6763@item Asian family
6764Japanese, @c   122.1 million speakers
6765Vietnamese, @c  68.6 million speakers
6766Korean @c       66.3 million speakers
6767@item Tai-Kadai family
6768Thai @c         20.4 million speakers
6769@end table
6770
6771@item Two forms, singular used for one only
6772This is the form used in most existing programs since it is what English
6773is using.  A header entry would look like this:
6774
6775@smallexample
6776Plural-Forms: nplurals=2; plural=n != 1;
6777@end smallexample
6778
6779(Note: this uses the feature of C expressions that boolean expressions
6780have to value zero or one.)
6781
6782@noindent
6783Languages with this property include:
6784
6785@table @asis
6786@item Germanic family
6787English, @c    328.0 million speakers
6788German, @c      96.9 million speakers
6789Dutch, @c       21.7 million speakers
6790Swedish, @c      8.3 million speakers
6791Danish, @c       5.6 million speakers
6792Norwegian, @c    4.6 million speakers
6793Faroese @c       0.05 million speakers
6794@item Romanic family
6795Spanish, @c    328.5 million speakers
6796Portuguese, @c 178.0 million speakers - 163 million Brazilian Portuguese
6797Italian @c      61.7 million speakers
6798@item Latin/Greek family
6799Greek @c        13.1 million speakers
6800@item Slavic family
6801Bulgarian @c     9.1 million speakers
6802@item Finno-Ugric family
6803Finnish, @c      5.0 million speakers
6804Estonian @c      1.0 million speakers
6805@item Semitic family
6806Hebrew @c        5.3 million speakers
6807@item Austronesian family
6808Bahasa Indonesian @c 23.2 million speakers
6809@item Artificial
6810Esperanto @c       2 million speakers
6811@end table
6812
6813@noindent
6814Other languages using the same header entry are:
6815
6816@table @asis
6817@item Finno-Ugric family
6818Hungarian @c   12.5 million speakers
6819@item Turkic/Altaic family
6820Turkish @c     50.8 million speakers
6821@end table
6822
6823Hungarian does not appear to have a plural if you look at sentences involving
6824cardinal numbers.  For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is
6825``123 alma''.  But when the number is not explicit, the distinction between
6826singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is
6827``az alm@'{a}k''.  Since @code{ngettext} has to support both types of sentences,
6828it is classified here, under ``two forms''.
6829
6830The same holds for Turkish: ``1 apple'' is ``1 elma'', and ``123 apples'' is
6831``123 elma''.  But when the number is omitted, the distinction between singular
6832and plural exists: ``the apple'' is ``elma'', and ``the apples'' is
6833``elmalar''.
6834
6835@item Two forms, singular used for zero and one
6836Exceptional case in the language family.  The header entry would be:
6837
6838@smallexample
6839Plural-Forms: nplurals=2; plural=n>1;
6840@end smallexample
6841
6842@noindent
6843Languages with this property include:
6844
6845@table @asis
6846@item Romanic family
6847Brazilian Portuguese, @c 163 million speakers
6848French @c                 67.8 million speakers
6849@end table
6850
6851@item Three forms, special case for zero
6852The header entry would be:
6853
6854@smallexample
6855Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
6856@end smallexample
6857
6858@noindent
6859Languages with this property include:
6860
6861@table @asis
6862@item Baltic family
6863Latvian @c     1.5 million speakers
6864@end table
6865
6866@item Three forms, special cases for one and two
6867The header entry would be:
6868
6869@smallexample
6870Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
6871@end smallexample
6872
6873@noindent
6874Languages with this property include:
6875
6876@table @asis
6877@item Celtic
6878Gaeilge (Irish) @c 0.4 million speakers
6879@end table
6880
6881@item Three forms, special case for numbers ending in 00 or [2-9][0-9]
6882The header entry would be:
6883
6884@smallexample
6885Plural-Forms: nplurals=3; \
6886    plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
6887@end smallexample
6888
6889@noindent
6890Languages with this property include:
6891
6892@table @asis
6893@item Romanic family
6894Romanian @c    23.4 million speakers
6895@end table
6896
6897@item Three forms, special case for numbers ending in 1[2-9]
6898The header entry would look like this:
6899
6900@smallexample
6901Plural-Forms: nplurals=3; \
6902    plural=n%10==1 && n%100!=11 ? 0 : \
6903           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
6904@end smallexample
6905
6906@noindent
6907Languages with this property include:
6908
6909@table @asis
6910@item Baltic family
6911Lithuanian @c  3.2 million speakers
6912@end table
6913
6914@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
6915The header entry would look like this:
6916
6917@smallexample
6918Plural-Forms: nplurals=3; \
6919    plural=n%10==1 && n%100!=11 ? 0 : \
6920           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6921@end smallexample
6922
6923@noindent
6924Languages with this property include:
6925
6926@table @asis
6927@item Slavic family
6928Russian, @c    143.6 million speakers
6929Ukrainian, @c   37.0 million speakers
6930Belarusian, @c   8.6 million speakers
6931Serbian, @c      7.0 million speakers
6932Croatian @c      5.5 million speakers
6933@end table
6934
6935@item Three forms, special cases for 1 and 2, 3, 4
6936The header entry would look like this:
6937
6938@smallexample
6939Plural-Forms: nplurals=3; \
6940    plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
6941@end smallexample
6942
6943@noindent
6944Languages with this property include:
6945
6946@table @asis
6947@item Slavic family
6948Czech, @c      9.5 million speakers
6949Slovak @c      5.0 million speakers
6950@end table
6951
6952@item Three forms, special case for one and some numbers ending in 2, 3, or 4
6953The header entry would look like this:
6954
6955@smallexample
6956Plural-Forms: nplurals=3; \
6957    plural=n==1 ? 0 : \
6958           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6959@end smallexample
6960
6961@noindent
6962Languages with this property include:
6963
6964@table @asis
6965@item Slavic family
6966Polish @c      40.0 million speakers
6967@end table
6968
6969@item Four forms, special case for one and all numbers ending in 02, 03, or 04
6970The header entry would look like this:
6971
6972@smallexample
6973Plural-Forms: nplurals=4; \
6974    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
6975@end smallexample
6976
6977@noindent
6978Languages with this property include:
6979
6980@table @asis
6981@item Slavic family
6982Slovenian @c   1.9 million speakers
6983@end table
6984
6985@item Six forms, special cases for one, two, all numbers ending in 02, 03, @dots{} 10, all numbers ending in 11 @dots{} 99, and others
6986The header entry would look like this:
6987
6988@smallexample
6989Plural-Forms: nplurals=6; \
6990    plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
6991    : n%100>=11 ? 4 : 5;
6992@end smallexample
6993
6994@noindent
6995Languages with this property include:
6996
6997@table @asis
6998@item Afroasiatic family
6999Arabic @c    246.0 million speakers
7000@end table
7001@end table
7002
7003You might now ask, @code{ngettext} handles only numbers @var{n} of type
7004@samp{unsigned long}.  What about larger integer types?  What about negative
7005numbers?  What about floating-point numbers?
7006
7007About larger integer types, such as @samp{uintmax_t} or
7008@samp{unsigned long long}: they can be handled by reducing the value to a
7009range that fits in an @samp{unsigned long}.  Simply casting the value to
7010@samp{unsigned long} would not do the right thing, since it would treat
7011@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
7012the like.  Here you can exploit the fact that all mentioned plural form
7013formulas eventually become periodic, with a period that is a divisor of 100
7014(or 1000 or 1000000).  So, when you reduce a large value to another one in
7015the range [1000000, 1999999] that ends in the same 6 decimal digits, you
7016can assume that it will lead to the same plural form selection.  This code
7017does this:
7018
7019@smallexample
7020#include <inttypes.h>
7021uintmax_t nbytes = ...;
7022printf (ngettext ("The file has %"PRIuMAX" byte.",
7023                  "The file has %"PRIuMAX" bytes.",
7024                  (nbytes > ULONG_MAX
7025                   ? (nbytes % 1000000) + 1000000
7026                   : nbytes)),
7027        nbytes);
7028@end smallexample
7029
7030Negative and floating-point values usually represent physical entities for
7031which singular and plural don't clearly apply.  In such cases, there is no
7032need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
7033for all values will do.  For example:
7034
7035@smallexample
7036printf (gettext ("Time elapsed: %.3f seconds"),
7037        num_milliseconds * 0.001);
7038@end smallexample
7039
7040@noindent
7041Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
7042@smallexample
7043Time elapsed: 1.000 seconds
7044@end smallexample
7045@noindent
7046is acceptable in English, and similarly for other languages.
7047
7048The translators' perspective regarding plural forms is explained in
7049@ref{Translating plural forms}.
7050
7051@node Optimized gettext
7052@subsection Optimization of the *gettext functions
7053@cindex optimization of @code{gettext} functions
7054
7055At this point of the discussion we should talk about an advantage of the
7056GNU @code{gettext} implementation.  Some readers might have pointed out
7057that an internationalized program might have a poor performance if some
7058string has to be translated in an inner loop.  While this is unavoidable
7059when the string varies from one run of the loop to the other it is
7060simply a waste of time when the string is always the same.  Take the
7061following example:
7062
7063@example
7064@group
7065@{
7066  while (@dots{})
7067    @{
7068      puts (gettext ("Hello world"));
7069    @}
7070@}
7071@end group
7072@end example
7073
7074@noindent
7075When the locale selection does not change between two runs the resulting
7076string is always the same.  One way to use this is:
7077
7078@example
7079@group
7080@{
7081  str = gettext ("Hello world");
7082  while (@dots{})
7083    @{
7084      puts (str);
7085    @}
7086@}
7087@end group
7088@end example
7089
7090@noindent
7091But this solution is not usable in all situation (e.g.@: when the locale
7092selection changes) nor does it lead to legible code.
7093
7094For this reason, GNU @code{gettext} caches previous translation results.
7095When the same translation is requested twice, with no new message
7096catalogs being loaded in between, @code{gettext} will, the second time,
7097find the result through a single cache lookup.
7098
7099@node Comparison
7100@section Comparing the Two Interfaces
7101@cindex @code{gettext} vs @code{catgets}
7102@cindex comparison of interfaces
7103
7104@c FIXME: arguments to catgets vs. gettext
7105@c Partly done 950718 -- drepper
7106
7107The following discussion is perhaps a little bit colored.  As said
7108above we implemented GNU @code{gettext} following the Uniforum
7109proposal and this surely has its reasons.  But it should show how we
7110came to this decision.
7111
7112First we take a look at the developing process.  When we write an
7113application using NLS provided by @code{gettext} we proceed as always.
7114Only when we come to a string which might be seen by the users and thus
7115has to be translated we use @code{gettext("@dots{}")} instead of
7116@code{"@dots{}"}.  At the beginning of each source file (or in a central
7117header file) we define
7118
7119@example
7120#define gettext(String) (String)
7121@end example
7122
7123Even this definition can be avoided when the system supports the
7124@code{gettext} function in its C library.  When we compile this code the
7125result is the same as if no NLS code is used.  When  you take a look at
7126the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
7127instead of @code{gettext("@dots{}")}.  This reduces the number of
7128additional characters per translatable string to @emph{3} (in words:
7129three).
7130
7131When now a production version of the program is needed we simply replace
7132the definition
7133
7134@example
7135#define _(String) (String)
7136@end example
7137
7138@noindent
7139by
7140
7141@cindex include file @file{libintl.h}
7142@example
7143#include <libintl.h>
7144#define _(String) gettext (String)
7145@end example
7146
7147@noindent
7148Additionally we run the program @file{xgettext} on all source code file
7149which contain translatable strings and that's it: we have a running
7150program which does not depend on translations to be available, but which
7151can use any that becomes available.
7152
7153@cindex @code{N_}, a convenience macro
7154The same procedure can be done for the @code{gettext_noop} invocations
7155(@pxref{Special cases}).  One usually defines @code{gettext_noop} as a
7156no-op macro.  So you should consider the following code for your project:
7157
7158@example
7159#define gettext_noop(String) String
7160#define N_(String) gettext_noop (String)
7161@end example
7162
7163@code{N_} is a short form similar to @code{_}.  The @file{Makefile} in
7164the @file{po/} directory of GNU @code{gettext} knows by default both of the
7165mentioned short forms so you are invited to follow this proposal for
7166your own ease.
7167
7168Now to @code{catgets}.  The main problem is the work for the
7169programmer.  Every time he comes to a translatable string he has to
7170define a number (or a symbolic constant) which has also be defined in
7171the message catalog file.  He also has to take care for duplicate
7172entries, duplicate message IDs etc.  If he wants to have the same
7173quality in the message catalog as the GNU @code{gettext} program
7174provides he also has to put the descriptive comments for the strings and
7175the location in all source code files in the message catalog.  This is
7176nearly a Mission: Impossible.
7177
7178But there are also some points people might call advantages speaking for
7179@code{catgets}.  If you have a single word in a string and this string
7180is used in different contexts it is likely that in one or the other
7181language the word has different translations.  Example:
7182
7183@example
7184printf ("%s: %d", gettext ("number"), number_of_errors)
7185
7186printf ("you should see %d %s", number_count,
7187        number_count == 1 ? gettext ("number") : gettext ("numbers"))
7188@end example
7189
7190Here we have to translate two times the string @code{"number"}.  Even
7191if you do not speak a language beside English it might be possible to
7192recognize that the two words have a different meaning.  In German the
7193first appearance has to be translated to @code{"Anzahl"} and the second
7194to @code{"Zahl"}.
7195
7196Now you can say that this example is really esoteric.  And you are
7197right!  This is exactly how we felt about this problem and decide that
7198it does not weight that much.  The solution for the above problem could
7199be very easy:
7200
7201@example
7202printf ("%s %d", gettext ("number:"), number_of_errors)
7203
7204printf (number_count == 1 ? gettext ("you should see %d number")
7205                          : gettext ("you should see %d numbers"),
7206        number_count)
7207@end example
7208
7209We believe that we can solve all conflicts with this method.  If it is
7210difficult one can also consider changing one of the conflicting string a
7211little bit.  But it is not impossible to overcome.
7212
7213@code{catgets} allows same original entry to have different translations,
7214but @code{gettext} has another, scalable approach for solving ambiguities
7215of this kind: @xref{Ambiguities}.
7216
7217@node Using libintl.a
7218@section Using libintl.a in own programs
7219
7220Starting with version 0.9.4 the library @code{libintl.h} should be
7221self-contained.  I.e., you can use it in your own programs without
7222providing additional functions.  The @file{Makefile} will put the header
7223and the library in directories selected using the @code{$(prefix)}.
7224
7225@node gettext grok
7226@section Being a @code{gettext} grok
7227
7228@strong{ NOTE: } This documentation section is outdated and needs to be
7229revised.
7230
7231To fully exploit the functionality of the GNU @code{gettext} library it
7232is surely helpful to read the source code.  But for those who don't want
7233to spend that much time in reading the (sometimes complicated) code here
7234is a list comments:
7235
7236@itemize @bullet
7237@item Changing the language at runtime
7238@cindex language selection at runtime
7239
7240For interactive programs it might be useful to offer a selection of the
7241used language at runtime.  To understand how to do this one need to know
7242how the used language is determined while executing the @code{gettext}
7243function.  The method which is presented here only works correctly
7244with the GNU implementation of the @code{gettext} functions.
7245
7246In the function @code{dcgettext} at every call the current setting of
7247the highest priority environment variable is determined and used.
7248Highest priority means here the following list with decreasing
7249priority:
7250
7251@enumerate
7252@vindex LANGUAGE@r{, environment variable}
7253@item @code{LANGUAGE}
7254@vindex LC_ALL@r{, environment variable}
7255@item @code{LC_ALL}
7256@vindex LC_CTYPE@r{, environment variable}
7257@vindex LC_NUMERIC@r{, environment variable}
7258@vindex LC_TIME@r{, environment variable}
7259@vindex LC_COLLATE@r{, environment variable}
7260@vindex LC_MONETARY@r{, environment variable}
7261@vindex LC_MESSAGES@r{, environment variable}
7262@item @code{LC_xxx}, according to selected locale category
7263@vindex LANG@r{, environment variable}
7264@item @code{LANG}
7265@end enumerate
7266
7267Afterwards the path is constructed using the found value and the
7268translation file is loaded if available.
7269
7270What happens now when the value for, say, @code{LANGUAGE} changes?  According
7271to the process explained above the new value of this variable is found
7272as soon as the @code{dcgettext} function is called.  But this also means
7273the (perhaps) different message catalog file is loaded.  In other
7274words: the used language is changed.
7275
7276But there is one little hook.  The code for gcc-2.7.0 and up provides
7277some optimization.  This optimization normally prevents the calling of
7278the @code{dcgettext} function as long as no new catalog is loaded.  But
7279if @code{dcgettext} is not called the program also cannot find the
7280@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}).  A
7281solution for this is very easy.  Include the following code in the
7282language switching function.
7283
7284@example
7285  /* Change language.  */
7286  setenv ("LANGUAGE", "fr", 1);
7287
7288  /* Make change known.  */
7289  @{
7290    extern int  _nl_msg_cat_cntr;
7291    ++_nl_msg_cat_cntr;
7292  @}
7293@end example
7294
7295@cindex @code{_nl_msg_cat_cntr}
7296The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
7297You don't need to know what this is for.  But it can be used to detect
7298whether a @code{gettext} implementation is GNU gettext and not non-GNU
7299system's native gettext implementation.
7300
7301@end itemize
7302
7303@node Temp Programmers
7304@section Temporary Notes for the Programmers Chapter
7305
7306@strong{ NOTE: } This documentation section is outdated and needs to be
7307revised.
7308
7309@menu
7310* Temp Implementations::        Temporary - Two Possible Implementations
7311* Temp catgets::                Temporary - About @code{catgets}
7312* Temp WSI::                    Temporary - Why a single implementation
7313* Temp Notes::                  Temporary - Notes
7314@end menu
7315
7316@node Temp Implementations
7317@subsection Temporary - Two Possible Implementations
7318
7319There are two competing methods for language independent messages:
7320the X/Open @code{catgets} method, and the Uniforum @code{gettext}
7321method.  The @code{catgets} method indexes messages by integers; the
7322@code{gettext} method indexes them by their English translations.
7323The @code{catgets} method has been around longer and is supported
7324by more vendors.  The @code{gettext} method is supported by Sun,
7325and it has been heard that the COSE multi-vendor initiative is
7326supporting it.  Neither method is a POSIX standard; the POSIX.1
7327committee had a lot of disagreement in this area.
7328
7329Neither one is in the POSIX standard.  There was much disagreement
7330in the POSIX.1 committee about using the @code{gettext} routines
7331vs. @code{catgets} (XPG).  In the end the committee couldn't
7332agree on anything, so no messaging system was included as part
7333of the standard.  I believe the informative annex of the standard
7334includes the XPG3 messaging interfaces, ``@dots{}as an example of
7335a messaging system that has been implemented@dots{}''
7336
7337They were very careful not to say anywhere that you should use one
7338set of interfaces over the other.  For more on this topic please
7339see the Programming for Internationalization FAQ.
7340
7341@node Temp catgets
7342@subsection Temporary - About @code{catgets}
7343
7344There have been a few discussions of late on the use of
7345@code{catgets} as a base.  I think it important to present both
7346sides of the argument and hence am opting to play devil's advocate
7347for a little bit.
7348
7349I'll not deny the fact that @code{catgets} could have been designed
7350a lot better.  It currently has quite a number of limitations and
7351these have already been pointed out.
7352
7353However there is a great deal to be said for consistency and
7354standardization.  A common recurring problem when writing Unix
7355software is the myriad portability problems across Unix platforms.
7356It seems as if every Unix vendor had a look at the operating system
7357and found parts they could improve upon.  Undoubtedly, these
7358modifications are probably innovative and solve real problems.
7359However, software developers have a hard time keeping up with all
7360these changes across so many platforms.
7361
7362And this has prompted the Unix vendors to begin to standardize their
7363systems.  Hence the impetus for Spec1170.  Every major Unix vendor
7364has committed to supporting this standard and every Unix software
7365developer waits with glee the day they can write software to this
7366standard and simply recompile (without having to use autoconf)
7367across different platforms.
7368
7369As I understand it, Spec1170 is roughly based upon version 4 of the
7370X/Open Portability Guidelines (XPG4).  Because @code{catgets} and
7371friends are defined in XPG4, I'm led to believe that @code{catgets}
7372is a part of Spec1170 and hence will become a standardized component
7373of all Unix systems.
7374
7375@node Temp WSI
7376@subsection Temporary - Why a single implementation
7377
7378Now it seems kind of wasteful to me to have two different systems
7379installed for accessing message catalogs.  If we do want to remedy
7380@code{catgets} deficiencies why don't we try to expand @code{catgets}
7381(in a compatible manner) rather than implement an entirely new system.
7382Otherwise, we'll end up with two message catalog access systems installed
7383with an operating system - one set of routines for packages using GNU
7384@code{gettext} for their internationalization, and another set of routines
7385(catgets) for all other software.  Bloated?
7386
7387Supposing another catalog access system is implemented.  Which do
7388we recommend?  At least for Linux, we need to attract as many
7389software developers as possible.  Hence we need to make it as easy
7390for them to port their software as possible.  Which means supporting
7391@code{catgets}.  We will be implementing the @code{libintl} code
7392within our @code{libc}, but does this mean we also have to incorporate
7393another message catalog access scheme within our @code{libc} as well?
7394And what about people who are going to be using the @code{libintl}
7395+ non-@code{catgets} routines.  When they port their software to
7396other platforms, they're now going to have to include the front-end
7397(@code{libintl}) code plus the back-end code (the non-@code{catgets}
7398access routines) with their software instead of just including the
7399@code{libintl} code with their software.
7400
7401Message catalog support is however only the tip of the iceberg.
7402What about the data for the other locale categories?  They also have
7403a number of deficiencies.  Are we going to abandon them as well and
7404develop another duplicate set of routines (should @code{libintl}
7405expand beyond message catalog support)?
7406
7407Like many parts of Unix that can be improved upon, we're stuck with balancing
7408compatibility with the past with useful improvements and innovations for
7409the future.
7410
7411@node Temp Notes
7412@subsection Temporary - Notes
7413
7414X/Open agreed very late on the standard form so that many
7415implementations differ from the final form.  Both of my system (old
7416Linux catgets and Ultrix-4) have a strange variation.
7417
7418OK.  After incorporating the last changes I have to spend some time on
7419making the GNU/Linux @code{libc} @code{gettext} functions.  So in future
7420Solaris is not the only system having @code{gettext}.
7421
7422@node Translators
7423@chapter The Translator's View
7424
7425@c FIXME: Reorganize whole chapter.
7426
7427@menu
7428* Trans Intro 0::               Introduction 0
7429* Trans Intro 1::               Introduction 1
7430* Discussions::                 Discussions
7431* Organization::                Organization
7432* Information Flow::            Information Flow
7433* Translating plural forms::    How to fill in @code{msgstr[0]}, @code{msgstr[1]}
7434* Prioritizing messages::       How to find which messages to translate first
7435@end menu
7436
7437@node Trans Intro 0
7438@section Introduction 0
7439
7440@strong{ NOTE: } This documentation section is outdated and needs to be
7441revised.
7442
7443Free software is going international!  The Translation Project is a way
7444to get maintainers, translators and users all together, so free software
7445will gradually become able to speak many native languages.
7446
7447The GNU @code{gettext} tool set contains @emph{everything} maintainers
7448need for internationalizing their packages for messages.  It also
7449contains quite useful tools for helping translators at localizing
7450messages to their native language, once a package has already been
7451internationalized.
7452
7453To achieve the Translation Project, we need many interested
7454people who like their own language and write it well, and who are also
7455able to synergize with other translators speaking the same language.
7456If you'd like to volunteer to @emph{work} at translating messages,
7457please send mail to your translating team.
7458
7459Each team has its own mailing list, courtesy of Linux
7460International.  You may reach your translating team at the address
7461@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
7462code for your language.  Language codes are @emph{not} the same as
7463country codes given in @w{ISO 3166}.  The following translating teams
7464exist:
7465
7466@quotation
7467Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
7468Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
7469@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
7470Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
7471@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
7472Swedish @code{sv} and Turkish @code{tr}.
7473@end quotation
7474
7475@noindent
7476For example, you may reach the Chinese translating team by writing to
7477@file{zh@@li.org}.  When you become a member of the translating team
7478for your own language, you may subscribe to its list.  For example,
7479Swedish people can send a message to @w{@file{sv-request@@li.org}},
7480having this message body:
7481
7482@example
7483subscribe
7484@end example
7485
7486Keep in mind that team members should be interested in @emph{working}
7487at translations, or at solving translational difficulties, rather than
7488merely lurking around.  If your team does not exist yet and you want to
7489start one, please write to @w{@file{coordinator@@translationproject.org}};
7490you will then reach the coordinator for all translator teams.
7491
7492A handful of GNU packages have already been adapted and provided
7493with message translations for several languages.  Translation
7494teams have begun to organize, using these packages as a starting
7495point.  But there are many more packages and many languages for
7496which we have no volunteer translators.  If you would like to
7497volunteer to work at translating messages, please send mail to
7498@file{coordinator@@translationproject.org} indicating what language(s)
7499you can work on.
7500
7501@node Trans Intro 1
7502@section Introduction 1
7503
7504@strong{ NOTE: } This documentation section is outdated and needs to be
7505revised.
7506
7507This is now official, GNU is going international!  Here is the
7508announcement submitted for the January 1995 GNU Bulletin:
7509
7510@quotation
7511A handful of GNU packages have already been adapted and provided
7512with message translations for several languages.  Translation
7513teams have begun to organize, using these packages as a starting
7514point.  But there are many more packages and many languages
7515for which we have no volunteer translators.  If you'd like to
7516volunteer to work at translating messages, please send mail to
7517@samp{coordinator@@translationproject.org} indicating what language(s)
7518you can work on.
7519@end quotation
7520
7521This document should answer many questions for those who are curious about
7522the process or would like to contribute.  Please at least skim over it,
7523hoping to cut down a little of the high volume of e-mail generated by this
7524collective effort towards internationalization of free software.
7525
7526Most free programming which is widely shared is done in English, and
7527currently, English is used as the main communicating language between
7528national communities collaborating to free software.  This very document
7529is written in English.  This will not change in the foreseeable future.
7530
7531However, there is a strong appetite from national communities for
7532having more software able to write using national language and habits,
7533and there is an on-going effort to modify free software in such a way
7534that it becomes able to do so.  The experiments driven so far raised
7535an enthusiastic response from pretesters, so we believe that
7536internationalization of free software is dedicated to succeed.
7537
7538For suggestion clarifications, additions or corrections to this
7539document, please e-mail to @file{coordinator@@translationproject.org}.
7540
7541@node Discussions
7542@section Discussions
7543
7544@strong{ NOTE: } This documentation section is outdated and needs to be
7545revised.
7546
7547Facing this internationalization effort, a few users expressed their
7548concerns.  Some of these doubts are presented and discussed, here.
7549
7550@itemize @bullet
7551@item Smaller groups
7552
7553Some languages are not spoken by a very large number of people, so people
7554speaking them sometimes consider that there may not be all that much
7555demand such versions of free software packages.  Moreover, many people
7556being @emph{into computers}, in some countries, generally seem to prefer
7557English versions of their software.
7558
7559On the other end, people might enjoy their own language a lot, and be
7560very motivated at providing to themselves the pleasure of having their
7561beloved free software speaking their mother tongue.  They do themselves
7562a personal favor, and do not pay that much attention to the number of
7563people benefiting of their work.
7564
7565@item Misinterpretation
7566
7567Other users are shy to push forward their own language, seeing in this
7568some kind of misplaced propaganda.  Someone thought there must be some
7569users of the language over the networks pestering other people with it.
7570
7571But any spoken language is worth localization, because there are
7572people behind the language for whom the language is important and
7573dear to their hearts.
7574
7575@item Odd translations
7576
7577The biggest problem is to find the right translations so that
7578everybody can understand the messages.  Translations are usually a
7579little odd.  Some people get used to English, to the extent they may
7580find translations into their own language ``rather pushy, obnoxious
7581and sometimes even hilarious.''  As a French speaking man, I have
7582the experience of those instruction manuals for goods, so poorly
7583translated in French in Korea or Taiwan@dots{}
7584
7585The fact is that we sometimes have to create a kind of national
7586computer culture, and this is not easy without the collaboration of
7587many people liking their mother tongue.  This is why translations are
7588better achieved by people knowing and loving their own language, and
7589ready to work together at improving the results they obtain.
7590
7591@item Dependencies over the GPL or LGPL
7592
7593Some people wonder if using GNU @code{gettext} necessarily brings their
7594package under the protective wing of the GNU General Public License or
7595the GNU Lesser General Public License, when they do not want to make
7596their program free, or want other kinds of freedom.  The simplest
7597answer is ``normally not''.
7598
7599The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the
7600contents of @code{libintl}, is covered by the GNU Lesser General Public
7601License.  The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the
7602rest of the GNU @code{gettext} package, is covered by the GNU General
7603Public License.
7604
7605The mere marking of localizable strings in a package, or conditional
7606inclusion of a few lines for initialization, is not really including
7607GPL'ed or LGPL'ed code.  However, since the localization routines in
7608@code{libintl} are under the LGPL, the LGPL needs to be considered.
7609It gives the right to distribute the complete unmodified source of
7610@code{libintl} even with non-free programs.  It also gives the right
7611to use @code{libintl} as a shared library, even for non-free programs.
7612But it gives the right to use @code{libintl} as a static library or
7613to incorporate @code{libintl} into another library only to free
7614software.
7615
7616@end itemize
7617
7618@node Organization
7619@section Organization
7620
7621@strong{ NOTE: } This documentation section is outdated and needs to be
7622revised.
7623
7624On a larger scale, the true solution would be to organize some kind of
7625fairly precise set up in which volunteers could participate.  I gave
7626some thought to this idea lately, and realize there will be some
7627touchy points.  I thought of writing to Richard Stallman to launch
7628such a project, but feel it might be good to shake out the ideas
7629between ourselves first.  Most probably that Linux International has
7630some experience in the field already, or would like to orchestrate
7631the volunteer work, maybe.  Food for thought, in any case!
7632
7633I guess we have to setup something early, somehow, that will help
7634many possible contributors of the same language to interlock and avoid
7635work duplication, and further be put in contact for solving together
7636problems particular to their tongue (in most languages, there are many
7637difficulties peculiar to translating technical English).  My Swedish
7638contributor acknowledged these difficulties, and I'm well aware of
7639them for French.
7640
7641This is surely not a technical issue, but we should manage so the
7642effort of locale contributors be maximally useful, despite the national
7643team layer interface between contributors and maintainers.
7644
7645The Translation Project needs some setup for coordinating language
7646coordinators.  Localizing evolving programs will surely
7647become a permanent and continuous activity in the free software community,
7648once well started.
7649The setup should be minimally completed and tested before GNU
7650@code{gettext} becomes an official reality.  The e-mail address
7651@file{coordinator@@translationproject.org} has been set up for receiving
7652offers from volunteers and general e-mail on these topics.  This address
7653reaches the Translation Project coordinator.
7654
7655@menu
7656* Central Coordination::        Central Coordination
7657* National Teams::              National Teams
7658* Mailing Lists::               Mailing Lists
7659@end menu
7660
7661@node Central Coordination
7662@subsection Central Coordination
7663
7664I also think GNU will need sooner than it thinks, that someone set up
7665a way to organize and coordinate these groups.  Some kind of group
7666of groups.  My opinion is that it would be good that GNU delegates
7667this task to a small group of collaborating volunteers, shortly.
7668Perhaps in @file{gnu.announce} a list of this national committee's
7669can be published.
7670
7671My role as coordinator would simply be to refer to Ulrich any German
7672speaking volunteer interested to localization of free software packages, and
7673maybe helping national groups to initially organize, while maintaining
7674national registries for until national groups are ready to take over.
7675In fact, the coordinator should ease volunteers to get in contact with
7676one another for creating national teams, which should then select
7677one coordinator per language, or country (regionalized language).
7678If well done, the coordination should be useful without being an
7679overwhelming task, the time to put delegations in place.
7680
7681@node National Teams
7682@subsection National Teams
7683
7684I suggest we look for volunteer coordinators/editors for individual
7685languages.  These people will scan contributions of translation files
7686for various programs, for their own languages, and will ensure high
7687and uniform standards of diction.
7688
7689From my current experience with other people in these days, those who
7690provide localizations are very enthusiastic about the process, and are
7691more interested in the localization process than in the program they
7692localize, and want to do many programs, not just one.  This seems
7693to confirm that having a coordinator/editor for each language is a
7694good idea.
7695
7696We need to choose someone who is good at writing clear and concise
7697prose in the language in question.  That is hard---we can't check
7698it ourselves.  So we need to ask a few people to judge each others'
7699writing and select the one who is best.
7700
7701I announce my prerelease to a few dozen people, and you would not
7702believe all the discussions it generated already.  I shudder to think
7703what will happen when this will be launched, for true, officially,
7704world wide.  Who am I to arbitrate between two Czekolsovak users
7705contradicting each other, for example?
7706
7707I assume that your German is not much better than my French so that
7708I would not be able to judge about these formulations.  What I would
7709suggest is that for each language there is a group for people who
7710maintain the PO files and judge about changes.  I suspect there will
7711be cultural differences between how such groups of people will behave.
7712Some will have relaxed ways, reach consensus easily, and have anyone
7713of the group relate to the maintainers, while others will fight to
7714death, organize heavy administrations up to national standards, and
7715use strict channels.
7716
7717The German team is putting out a good example.  Right now, they are
7718maybe half a dozen people revising translations of each other and
7719discussing the linguistic issues.  I do not even have all the names.
7720Ulrich Drepper is taking care of coordinating the German team.
7721He subscribed to all my pretest lists, so I do not even have to warn
7722him specifically of incoming releases.
7723
7724I'm sure, that is a good idea to get teams for each language working
7725on translations.  That will make the translations better and more
7726consistent.
7727
7728@menu
7729* Sub-Cultures::                Sub-Cultures
7730* Organizational Ideas::        Organizational Ideas
7731@end menu
7732
7733@node Sub-Cultures
7734@subsubsection Sub-Cultures
7735
7736Taking French for example, there are a few sub-cultures around computers
7737which developed diverging vocabularies.  Picking volunteers here and
7738there without addressing this problem in an organized way, soon in the
7739project, might produce a distasteful mix of internationalized programs,
7740and possibly trigger endless quarrels among those who really care.
7741
7742Keeping some kind of unity in the way French localization of
7743internationalized programs is achieved is a difficult (and delicate) job.
7744Knowing the latin character of French people (:-), if we take this
7745the wrong way, we could end up nowhere, or spoil a lot of energies.
7746Maybe we should begin to address this problem seriously @emph{before}
7747GNU @code{gettext} become officially published.  And I suspect that this
7748means soon!
7749
7750@node Organizational Ideas
7751@subsubsection Organizational Ideas
7752
7753I expect the next big changes after the official release.  Please note
7754that I use the German translation of the short GPL message.  We need
7755to set a few good examples before the localization goes out for true
7756in the free software community.  Here are a few points to discuss:
7757
7758@itemize @bullet
7759@item
7760Each group should have one FTP server (at least one master).
7761
7762@item
7763The files on the server should reflect the latest version (of
7764course!) and it should also contain a RCS directory with the
7765corresponding archives (I don't have this now).
7766
7767@item
7768There should also be a ChangeLog file (this is more useful than the
7769RCS archive but can be generated automatically from the later by
7770Emacs).
7771
7772@item
7773A @dfn{core group} should judge about questionable changes (for now
7774this group consists solely by me but I ask some others occasionally;
7775this also seems to work).
7776
7777@end itemize
7778
7779@node Mailing Lists
7780@subsection Mailing Lists
7781
7782If we get any inquiries about GNU @code{gettext}, send them on to:
7783
7784@example
7785@file{coordinator@@translationproject.org}
7786@end example
7787
7788The @file{*-pretest} lists are quite useful to me, maybe the idea could
7789be generalized to many GNU, and non-GNU packages.  But each maintainer
7790his/her way!
7791
7792Fran@,{c}ois, we have a mechanism in place here at
7793@file{gnu.ai.mit.edu} to track teams, support mailing lists for
7794them and log members.  We have a slight preference that you use it.
7795If this is OK with you, I can get you clued in.
7796
7797Things are changing!  A few years ago, when Daniel Fekete and I
7798asked for a mailing list for GNU localization, nested at the FSF, we
7799were politely invited to organize it anywhere else, and so did we.
7800For communicating with my pretesters, I later made a handful of
7801mailing lists located at iro.umontreal.ca and administrated by
7802@code{majordomo}.  These lists have been @emph{very} dependable
7803so far@dots{}
7804
7805I suspect that the German team will organize itself a mailing list
7806located in Germany, and so forth for other countries.  But before they
7807organize for true, it could surely be useful to offer mailing lists
7808located at the FSF to each national team.  So yes, please explain me
7809how I should proceed to create and handle them.
7810
7811We should create temporary mailing lists, one per country, to help
7812people organize.  Temporary, because once regrouped and structured, it
7813would be fair the volunteers from country bring back @emph{their} list
7814in there and manage it as they want.  My feeling is that, in the long
7815run, each team should run its own list, from within their country.
7816There also should be some central list to which all teams could
7817subscribe as they see fit, as long as each team is represented in it.
7818
7819@node Information Flow
7820@section Information Flow
7821
7822@strong{ NOTE: } This documentation section is outdated and needs to be
7823revised.
7824
7825There will surely be some discussion about this messages after the
7826packages are finally released.  If people now send you some proposals
7827for better messages, how do you proceed?  Jim, please note that
7828right now, as I put forward nearly a dozen of localizable programs, I
7829receive both the translations and the coordination concerns about them.
7830
7831If I put one of my things to pretest, Ulrich receives the announcement
7832and passes it on to the German team, who make last minute revisions.
7833Then he submits the translation files to me @emph{as the maintainer}.
7834For free packages I do not maintain, I would not even hear about it.
7835This scheme could be made to work for the whole Translation Project,
7836I think.  For security reasons, maybe Ulrich (national coordinators,
7837in fact) should update central registry kept at the Translation Project
7838(Jim, me, or Len's recruits) once in a while.
7839
7840In December/January, I was aggressively ready to internationalize
7841all of GNU, giving myself the duty of one small GNU package per week
7842or so, taking many weeks or months for bigger packages.  But it does
7843not work this way.  I first did all the things I'm responsible for.
7844I've nothing against some missionary work on other maintainers, but
7845I'm also losing a lot of energy over it---same debates over again.
7846
7847And when the first localized packages are released we'll get a lot of
7848responses about ugly translations :-).  Surely, and we need to have
7849beforehand a fairly good idea about how to handle the information
7850flow between the national teams and the package maintainers.
7851
7852Please start saving somewhere a quick history of each PO file.  I know
7853for sure that the file format will change, allowing for comments.
7854It would be nice that each file has a kind of log, and references for
7855those who want to submit comments or gripes, or otherwise contribute.
7856I sent a proposal for a fast and flexible format, but it is not
7857receiving acceptance yet by the GNU deciders.  I'll tell you when I
7858have more information about this.
7859
7860@node Translating plural forms
7861@section Translating plural forms
7862
7863@cindex plural forms, translating
7864Suppose you are translating a PO file, and it contains an entry like this:
7865
7866@smallexample
7867#, c-format
7868msgid "One file removed"
7869msgid_plural "%d files removed"
7870msgstr[0] ""
7871msgstr[1] ""
7872@end smallexample
7873
7874@noindent
7875What does this mean? How do you fill it in?
7876
7877Such an entry denotes a message with plural forms, that is, a message where
7878the text depends on a cardinal number.  The general form of the message,
7879in English, is the @code{msgid_plural} line.  The @code{msgid} line is the
7880English singular form, that is, the form for when the number is equal to 1.
7881More details about plural forms are explained in @ref{Plural forms}.
7882
7883The first thing you need to look at is the @code{Plural-Forms} line in the
7884header entry of the PO file.  It contains the number of plural forms and a
7885formula.  If the PO file does not yet have such a line, you have to add it.
7886It only depends on the language into which you are translating.  You can
7887get this info by using the @code{msginit} command (see @ref{Creating}) --
7888it contains a database of known plural formulas -- or by asking other
7889members of your translation team.
7890
7891Suppose the line looks as follows:
7892
7893@smallexample
7894"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
7895"%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
7896@end smallexample
7897
7898It's logically one line; recall that the PO file formatting is allowed to
7899break long lines so that each physical line fits in 80 monospaced columns.
7900
7901The value of @code{nplurals} here tells you that there are three plural
7902forms.  The first thing you need to do is to ensure that the entry contains
7903an @code{msgstr} line for each of the forms:
7904
7905@smallexample
7906#, c-format
7907msgid "One file removed"
7908msgid_plural "%d files removed"
7909msgstr[0] ""
7910msgstr[1] ""
7911msgstr[2] ""
7912@end smallexample
7913
7914Then translate the @code{msgid_plural} line and fill it in into each
7915@code{msgstr} line:
7916
7917@smallexample
7918#, c-format
7919msgid "One file removed"
7920msgid_plural "%d files removed"
7921msgstr[0] "%d slika uklonjenih"
7922msgstr[1] "%d slika uklonjenih"
7923msgstr[2] "%d slika uklonjenih"
7924@end smallexample
7925
7926Now you can refine the translation so that it matches the plural form.
7927According to the formula above, @code{msgstr[0]} is used when the number
7928ends in 1 but does not end in 11; @code{msgstr[1]} is used when the number
7929ends in 2, 3, 4, but not in 12, 13, 14; and @code{msgstr[2]} is used in
7930all other cases.  With this knowledge, you can refine the translations:
7931
7932@smallexample
7933#, c-format
7934msgid "One file removed"
7935msgid_plural "%d files removed"
7936msgstr[0] "%d slika je uklonjena"
7937msgstr[1] "%d datoteke uklonjenih"
7938msgstr[2] "%d slika uklonjenih"
7939@end smallexample
7940
7941You noticed that in the English singular form (@code{msgid}) the number
7942placeholder could be omitted and replaced by the numeral word ``one''.
7943Can you do this in your translation as well?
7944
7945@smallexample
7946msgstr[0] "jednom datotekom je uklonjen"
7947@end smallexample
7948
7949@noindent
7950Well, it depends on whether @code{msgstr[0]} applies only to the number 1,
7951or to other numbers as well.  If, according to the plural formula,
7952@code{msgstr[0]} applies only to @code{n == 1}, then you can use the
7953specialized translation without the number placeholder.  In our case,
7954however, @code{msgstr[0]} also applies to the numbers 21, 31, 41, etc.,
7955and therefore you cannot omit the placeholder.
7956
7957@node Prioritizing messages
7958@section Prioritizing messages: How to determine which messages to translate first
7959
7960A translator sometimes has only a limited amount of time per week to
7961spend on a package, and some packages have quite large message catalogs
7962(over 1000 messages).  Therefore she wishes to translate the messages
7963first that are the most visible to the user, or that occur most frequently.
7964This section describes how to determine these "most urgent" messages.
7965It also applies to determine the "next most urgent" messages after the
7966message catalog has already been partially translated.
7967
7968In a first step, she uses the programs like a user would do.  While she
7969does this, the GNU @code{gettext} library logs into a file the not yet
7970translated messages for which a translation was requested from the program.
7971
7972In a second step, she uses the PO mode to translate precisely this set
7973of messages.
7974
7975@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
7976Here are more details.  The GNU @code{libintl} library (but not the
7977corresponding functions in GNU @code{libc}) supports an environment variable
7978@code{GETTEXT_LOG_UNTRANSLATED}.  The GNU @code{libintl} library will
7979log into this file the messages for which @code{gettext()} and related
7980functions couldn't find the translation.  If the file doesn't exist, it
7981will be created as needed.  On systems with GNU @code{libc} a shared library
7982@samp{preloadable_libintl.so} is provided that can be used with the ELF
7983@samp{LD_PRELOAD} mechanism.
7984
7985So, in the first step, the translator uses these commands on systems with
7986GNU @code{libc}:
7987
7988@smallexample
7989$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
7990$ export LD_PRELOAD
7991$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7992$ export GETTEXT_LOG_UNTRANSLATED
7993@end smallexample
7994
7995@noindent
7996and these commands on other systems:
7997
7998@smallexample
7999$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
8000$ export GETTEXT_LOG_UNTRANSLATED
8001@end smallexample
8002
8003Then she uses and peruses the programs.  (It is a good and recommended
8004practice to use the programs for which you provide translations: it
8005gives you the needed context.)  When done, she removes the environment
8006variables:
8007
8008@smallexample
8009$ unset LD_PRELOAD
8010$ unset GETTEXT_LOG_UNTRANSLATED
8011@end smallexample
8012
8013The second step starts with removing duplicates:
8014
8015@smallexample
8016$ msguniq $HOME/gettextlogused > missing.po
8017@end smallexample
8018
8019The result is a PO file, but needs some preprocessing before a PO file editor
8020can be used with it.  First, it is a multi-domain PO file, containing
8021messages from many translation domains.  Second, it lacks all translator
8022comments and source references.  Here is how to get a list of the affected
8023translation domains:
8024
8025@smallexample
8026$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
8027@end smallexample
8028
8029Then the translator can handle the domains one by one.  For simplicity,
8030let's use environment variables to denote the language, domain and source
8031package.
8032
8033@smallexample
8034$ lang=nl             # your language
8035$ domain=coreutils    # the name of the domain to be handled
8036$ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from
8037@end smallexample
8038
8039She takes the latest copy of @file{$lang.po} from the Translation Project,
8040or from the package (in most cases, @file{$package/po/$lang.po}), or
8041creates a fresh one if she's the first translator (see @ref{Creating}).
8042She then uses the following commands to mark the not urgent messages as
8043"obsolete".  (This doesn't mean that these messages - translated and
8044untranslated ones - will go away.  It simply means that the PO file editor
8045will ignore them in the following editing session.)
8046
8047@smallexample
8048$ msggrep --domain=$domain missing.po | grep -v '^domain' \
8049  > $domain-missing.po
8050$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
8051  > $domain.$lang-urgent.po
8052@end smallexample
8053
8054The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor
8055(@pxref{Editing}).
8056(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
8057preserve obsolete messages, as they should.)
8058Finally she restores the not urgent messages (with their earlier
8059translations, for those which were already translated) through this command:
8060
8061@smallexample
8062$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
8063  > $domain.$lang.po
8064@end smallexample
8065
8066Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
8067
8068@node Maintainers
8069@chapter The Maintainer's View
8070@cindex package maintainer's view of @code{gettext}
8071
8072The maintainer of a package has many responsibilities.  One of them
8073is ensuring that the package will install easily on many platforms,
8074and that the magic we described earlier (@pxref{Users}) will work
8075for installers and end users.
8076
8077Of course, there are many possible ways by which GNU @code{gettext}
8078might be integrated in a distribution, and this chapter does not cover
8079them in all generality.  Instead, it details one possible approach which
8080is especially adequate for many free software distributions following GNU
8081standards, or even better, Gnits standards, because GNU @code{gettext}
8082is purposely for helping the internationalization of the whole GNU
8083project, and as many other good free packages as possible.  So, the
8084maintainer's view presented here presumes that the package already has
8085a @file{configure.ac} file and uses GNU Autoconf.
8086
8087Nevertheless, GNU @code{gettext} may surely be useful for free packages
8088not following GNU standards and conventions, but the maintainers of such
8089packages might have to show imagination and initiative in organizing
8090their distributions so @code{gettext} work for them in all situations.
8091There are surely many, out there.
8092
8093Even if @code{gettext} methods are now stabilizing, slight adjustments
8094might be needed between successive @code{gettext} versions, so you
8095should ideally revise this chapter in subsequent releases, looking
8096for changes.
8097
8098@menu
8099* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
8100* Prerequisites::               Prerequisite Works
8101* gettextize Invocation::       Invoking the @code{gettextize} Program
8102* Adjusting Files::             Files You Must Create or Alter
8103* autoconf macros::             Autoconf macros for use in @file{configure.ac}
8104* Version Control Issues::
8105* Release Management::          Creating a Distribution Tarball
8106@end menu
8107
8108@node Flat and Non-Flat
8109@section Flat or Non-Flat Directory Structures
8110
8111Some free software packages are distributed as @code{tar} files which unpack
8112in a single directory, these are said to be @dfn{flat} distributions.
8113Other free software packages have a one level hierarchy of subdirectories, using
8114for example a subdirectory named @file{doc/} for the Texinfo manual and
8115man pages, another called @file{lib/} for holding functions meant to
8116replace or complement C libraries, and a subdirectory @file{src/} for
8117holding the proper sources for the package.  These other distributions
8118are said to be @dfn{non-flat}.
8119
8120We cannot say much about flat distributions.  A flat
8121directory structure has the disadvantage of increasing the difficulty
8122of updating to a new version of GNU @code{gettext}.  Also, if you have
8123many PO files, this could somewhat pollute your single directory.
8124Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
8125scripts, @code{sed} scripts and complicated Makefile rules, which don't
8126fit well into an existing flat structure.  For these reasons, we
8127recommend to use non-flat approach in this case as well.
8128
8129Maybe because GNU @code{gettext} itself has a non-flat structure,
8130we have more experience with this approach, and this is what will be
8131described in the remaining of this chapter.  Some maintainers might
8132use this as an opportunity to unflatten their package structure.
8133
8134@node Prerequisites
8135@section Prerequisite Works
8136@cindex converting a package to use @code{gettext}
8137@cindex migration from earlier versions of @code{gettext}
8138@cindex upgrading to new versions of @code{gettext}
8139
8140There are some works which are required for using GNU @code{gettext}
8141in one of your package.  These works have some kind of generality
8142that escape the point by point descriptions used in the remainder
8143of this chapter.  So, we describe them here.
8144
8145@itemize @bullet
8146@item
8147Before attempting to use @code{gettextize} you should install some
8148other packages first.
8149Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
8150@code{gettext} are already installed at your site, and if not, proceed
8151to do this first.  If you get to install these things, beware that
8152GNU @code{m4} must be fully installed before GNU Autoconf is even
8153@emph{configured}.
8154
8155To further ease the task of a package maintainer the @code{automake}
8156package was designed and implemented.  GNU @code{gettext} now uses this
8157tool and the @file{Makefile} in the @file{po/} directory therefore
8158knows about all the goals necessary for using @code{automake}.
8159
8160Those four packages are only needed by you, as a maintainer; the
8161installers of your own package and end users do not really need any of
8162GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
8163for successfully installing and running your package, with messages
8164properly translated.  But this is not completely true if you provide
8165internationalized shell scripts within your own package: GNU
8166@code{gettext} shall then be installed at the user site if the end users
8167want to see the translation of shell script messages.
8168
8169@item
8170Your package should use Autoconf and have a @file{configure.ac} or
8171@file{configure.in} file.
8172If it does not, you have to learn how.  The Autoconf documentation
8173is quite well written, it is a good idea that you print it and get
8174familiar with it.
8175
8176@item
8177Your C sources should have already been modified according to
8178instructions given earlier in this manual.  @xref{Sources}.
8179
8180@item
8181Your @file{po/} directory should receive all PO files submitted to you
8182by the translator teams, each having @file{@var{ll}.po} as a name.
8183This is not usually easy to get translation
8184work done before your package gets internationalized and available!
8185Since the cycle has to start somewhere, the easiest for the maintainer
8186is to start with absolutely no PO files, and wait until various
8187translator teams get interested in your package, and submit PO files.
8188
8189@end itemize
8190
8191It is worth adding here a few words about how the maintainer should
8192ideally behave with PO files submissions.  As a maintainer, your role is
8193to authenticate the origin of the submission as being the representative
8194of the appropriate translating teams of the Translation Project (forward
8195the submission to @file{coordinator@@translationproject.org} in case of doubt),
8196to ensure that the PO file format is not severely broken and does not
8197prevent successful installation, and for the rest, to merely put these
8198PO files in @file{po/} for distribution.
8199
8200As a maintainer, you do not have to take on your shoulders the
8201responsibility of checking if the translations are adequate or
8202complete, and should avoid diving into linguistic matters.  Translation
8203teams drive themselves and are fully responsible of their linguistic
8204choices for the Translation Project.  Keep in mind that translator teams are @emph{not}
8205driven by maintainers.  You can help by carefully redirecting all
8206communications and reports from users about linguistic matters to the
8207appropriate translation team, or explain users how to reach or join
8208their team.
8209
8210Maintainers should @emph{never ever} apply PO file bug reports
8211themselves, short-cutting translation teams.  If some translator has
8212difficulty to get some of her points through her team, it should not be
8213an option for her to directly negotiate translations with maintainers.
8214Teams ought to settle their problems themselves, if any.  If you, as
8215a maintainer, ever think there is a real problem with a team, please
8216never try to @emph{solve} a team's problem on your own.
8217
8218@node gettextize Invocation
8219@section Invoking the @code{gettextize} Program
8220
8221@include gettextize.texi
8222
8223@node Adjusting Files
8224@section Files You Must Create or Alter
8225@cindex @code{gettext} files
8226
8227Besides files which are automatically added through @code{gettextize},
8228there are many files needing revision for properly interacting with
8229GNU @code{gettext}.  If you are closely following GNU standards for
8230Makefile engineering and auto-configuration, the adaptations should
8231be easier to achieve.  Here is a point by point description of the
8232changes needed in each.
8233
8234So, here comes a list of files, each one followed by a description of
8235all alterations it needs.  Many examples are taken out from the GNU
8236@code{gettext} @value{VERSION} distribution itself, or from the GNU
8237@code{hello} distribution (@uref{https://www.gnu.org/software/hello}).
8238You may indeed refer to the source code of the GNU @code{gettext} and
8239GNU @code{hello} packages, as they are intended to be good examples for
8240using GNU gettext functionality.
8241
8242@menu
8243* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
8244* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
8245* po/Makevars::                 @file{Makevars} in @file{po/}
8246* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
8247* configure.ac::                @file{configure.ac} at top level
8248* config.guess::                @file{config.guess}, @file{config.sub} at top level
8249* mkinstalldirs::               @file{mkinstalldirs} at top level
8250* aclocal::                     @file{aclocal.m4} at top level
8251* config.h.in::                 @file{config.h.in} at top level
8252* Makefile::                    @file{Makefile.in} at top level
8253* src/Makefile::                @file{Makefile.in} in @file{src/}
8254* lib/gettext.h::               @file{gettext.h} in @file{lib/}
8255@end menu
8256
8257@node po/POTFILES.in
8258@subsection @file{POTFILES.in} in @file{po/}
8259@cindex @file{POTFILES.in} file
8260
8261The @file{po/} directory should receive a file named
8262@file{POTFILES.in}.  This file tells which files, among all program
8263sources, have marked strings needing translation.  Here is an example
8264of such a file:
8265
8266@example
8267@group
8268# List of source files containing translatable strings.
8269# Copyright (C) 1995 Free Software Foundation, Inc.
8270
8271# Common library files
8272lib/error.c
8273lib/getopt.c
8274lib/xmalloc.c
8275
8276# Package source files
8277src/gettext.c
8278src/msgfmt.c
8279src/xgettext.c
8280@end group
8281@end example
8282
8283@noindent
8284Hash-marked comments and white lines are ignored.  All other lines
8285list those source files containing strings marked for translation
8286(@pxref{Mark Keywords}), in a notation relative to the top level
8287of your whole distribution, rather than the location of the
8288@file{POTFILES.in} file itself.
8289
8290When a C file is automatically generated by a tool, like @code{flex} or
8291@code{bison}, that doesn't introduce translatable strings by itself,
8292it is recommended to list in @file{po/POTFILES.in} the real source file
8293(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
8294case of @code{bison}), not the generated C file.
8295
8296@node po/LINGUAS
8297@subsection @file{LINGUAS} in @file{po/}
8298@cindex @file{LINGUAS} file
8299
8300The @file{po/} directory should also receive a file named
8301@file{LINGUAS}.  This file contains the list of available translations.
8302It is a whitespace separated list.  Hash-marked comments and white lines
8303are ignored.  Here is an example file:
8304
8305@example
8306@group
8307# Set of available languages.
8308de fr
8309@end group
8310@end example
8311
8312@noindent
8313This example means that German and French PO files are available, so
8314that these languages are currently supported by your package.  If you
8315want to further restrict, at installation time, the set of installed
8316languages, this should not be done by modifying the @file{LINGUAS} file,
8317but rather by using the @code{LINGUAS} environment variable
8318(@pxref{Installers}).
8319
8320It is recommended that you add the "languages" @samp{en@@quot} and
8321@samp{en@@boldquot} to the @code{LINGUAS} file.  @code{en@@quot} is a
8322variant of English message catalogs (@code{en}) which uses real quotation
8323marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
8324and @samp{'}.  @code{en@@boldquot} is a variant of @code{en@@quot} that
8325additionally outputs quoted pieces of text in a bold font, when used in
8326a terminal emulator which supports the VT100 escape sequences (such as
8327@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
8328
8329These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
8330are constructed automatically, not by translators; to support them, you
8331need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
8332@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
8333in the @file{po/} directory.  You can copy them from GNU gettext's @file{po/}
8334directory; they are also installed by running @code{gettextize}.
8335
8336@node po/Makevars
8337@subsection @file{Makevars} in @file{po/}
8338@cindex @file{Makevars} file
8339
8340The @file{po/} directory also has a file named @file{Makevars}.  It
8341contains variables that are specific to your project.  @file{po/Makevars}
8342gets inserted into the @file{po/Makefile} when the latter is created.
8343The variables thus take effect when the POT file is created or updated,
8344and when the message catalogs get installed.
8345
8346The first three variables can be left unmodified if your package has a
8347single message domain and, accordingly, a single @file{po/} directory.
8348Only packages which have multiple @file{po/} directories at different
8349locations need to adjust the three first variables defined in
8350@file{Makevars}.
8351
8352As an alternative to the @code{XGETTEXT_OPTIONS} variable, it is also
8353possible to specify @code{xgettext} options through the
8354@code{AM_XGETTEXT_OPTION} autoconf macro.  See @ref{AM_XGETTEXT_OPTION}.
8355
8356@node po/Rules-*
8357@subsection Extending @file{Makefile} in @file{po/}
8358@cindex @file{Makefile.in.in} extensions
8359
8360All files called @file{Rules-*} in the @file{po/} directory get appended to
8361the @file{po/Makefile} when it is created.  They present an opportunity to
8362add rules for special PO files to the Makefile, without needing to mess
8363with @file{po/Makefile.in.in}.
8364
8365@cindex quotation marks
8366@vindex LANGUAGE@r{, environment variable}
8367GNU gettext comes with a @file{Rules-quot} file, containing rules for
8368building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}.  The
8369effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
8370environment variable to @samp{en@@quot} will get messages with proper
8371looking symmetric Unicode quotation marks instead of abusing the ASCII
8372grave accent and the ASCII apostrophe for indicating quotations.  To
8373enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
8374file.  The effect of @file{en@@boldquot.po} is that people who set
8375@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
8376marks, but also the quoted text will be shown in a bold font on terminals
8377and consoles.  This catalog is useful only for command-line programs, not
8378GUI programs.  To enable it, similarly add @code{en@@boldquot} to the
8379@file{po/LINGUAS} file.
8380
8381Similarly, you can create rules for building message catalogs for the
8382@file{sr@@latin} locale -- Serbian written with the Latin alphabet --
8383from those for the @file{sr} locale -- Serbian written with Cyrillic
8384letters.  See @ref{msgfilter Invocation}.
8385
8386@node configure.ac
8387@subsection @file{configure.ac} at top level
8388
8389@file{configure.ac} or @file{configure.in} - this is the source from which
8390@code{autoconf} generates the @file{configure} script.
8391
8392@enumerate
8393@item Declare the package and version.
8394@cindex package and version declaration in @file{configure.ac}
8395
8396This is done by a set of lines like these:
8397
8398@example
8399PACKAGE=gettext
8400VERSION=@value{VERSION}
8401AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
8402AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
8403AC_SUBST(PACKAGE)
8404AC_SUBST(VERSION)
8405@end example
8406
8407@noindent
8408or, if you are using GNU @code{automake}, by a line like this:
8409
8410@example
8411AM_INIT_AUTOMAKE(gettext, @value{VERSION})
8412@end example
8413
8414@noindent
8415Of course, you replace @samp{gettext} with the name of your package,
8416and @samp{@value{VERSION}} by its version numbers, exactly as they
8417should appear in the packaged @code{tar} file name of your distribution
8418(@file{gettext-@value{VERSION}.tar.gz}, here).
8419
8420@item Check for internationalization support.
8421
8422Here is the main @code{m4} macro for triggering internationalization
8423support.  Just add this line to @file{configure.ac}:
8424
8425@example
8426AM_GNU_GETTEXT([external])
8427@end example
8428
8429@noindent
8430This call is purposely simple, even if it generates a lot of configure
8431time checking and actions.
8432
8433@item Have output files created.
8434
8435The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac}
8436file, needs to be modified in two ways:
8437
8438@example
8439AC_OUTPUT([@var{existing configuration files} po/Makefile.in],
8440[@var{existing additional actions}])
8441@end example
8442
8443The modification to the first argument to @code{AC_OUTPUT} asks
8444for substitution in the @file{po/} directory.
8445Note the @samp{.in} suffix used for @file{po/} only.  This is because
8446the distributed file is really @file{po/Makefile.in.in}.
8447
8448@end enumerate
8449
8450@node config.guess
8451@subsection @file{config.guess}, @file{config.sub} at top level
8452
8453You need to add the GNU @file{config.guess} and @file{config.sub} files
8454to your distribution.  They are needed because the @code{AM_ICONV} macro
8455contains knowledge about specific platforms and therefore needs to
8456identify the platform.
8457
8458You can obtain the newest version of @file{config.guess} and
8459@file{config.sub} from the @samp{config} project at
8460@file{https://savannah.gnu.org/}. The commands to fetch them are
8461@smallexample
8462$ wget -O config.guess 'https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
8463$ wget -O config.sub 'https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
8464@end smallexample
8465@noindent
8466Less recent versions are also contained in the GNU @code{automake} and
8467GNU @code{libtool} packages.
8468
8469Normally, @file{config.guess} and @file{config.sub} are put at the
8470top level of a distribution.  But it is also possible to put them in a
8471subdirectory, altogether with other configuration support files like
8472@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}.
8473All you need to do, other than moving the files, is to add the following line
8474to your @file{configure.ac}.
8475
8476@example
8477AC_CONFIG_AUX_DIR([@var{subdir}])
8478@end example
8479
8480@node mkinstalldirs
8481@subsection @file{mkinstalldirs} at top level
8482@cindex @file{mkinstalldirs} file
8483
8484With earlier versions of GNU gettext, you needed to add the GNU
8485@file{mkinstalldirs} script to your distribution.  This is not needed any
8486more.  You can remove it.
8487
8488@node aclocal
8489@subsection @file{aclocal.m4} at top level
8490@cindex @file{aclocal.m4} file
8491
8492If you do not have an @file{aclocal.m4} file in your distribution,
8493the simplest is to concatenate the files @file{gettext.m4},
8494@file{host-cpu-c-abi.m4}, @file{intlmacosx.m4}, @file{iconv.m4},
8495@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, @file{nls.m4},
8496@file{po.m4}, @file{progtest.m4} from GNU @code{gettext}'s @file{m4/}
8497directory into a single file.
8498
8499If you already have an @file{aclocal.m4} file, then you will have
8500to merge the said macro files into your @file{aclocal.m4}.  Note that if
8501you are upgrading from a previous release of GNU @code{gettext}, you
8502should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
8503etc.), as they usually
8504change a little from one release of GNU @code{gettext} to the next.
8505Their contents may vary as we get more experience with strange systems
8506out there.
8507
8508You should be using GNU @code{automake} 1.9 or newer.  With it, you need
8509to copy the files @file{gettext.m4}, @file{host-cpu-c-abi.m4},
8510@file{intlmacosx.m4}, @file{iconv.m4}, @file{lib-ld.m4}, @file{lib-link.m4},
8511@file{lib-prefix.m4}, @file{nls.m4}, @file{po.m4}, @file{progtest.m4} from
8512GNU @code{gettext}'s @file{m4/} directory to a subdirectory named @file{m4/}
8513and add the line
8514
8515@example
8516ACLOCAL_AMFLAGS = -I m4
8517@end example
8518
8519@noindent
8520to your top level @file{Makefile.am}.
8521
8522If you are using GNU @code{automake} 1.10 or newer, it is even easier:
8523Add the line
8524
8525@example
8526ACLOCAL_AMFLAGS = --install -I m4
8527@end example
8528
8529@noindent
8530to your top level @file{Makefile.am}, and run @samp{aclocal --install -I m4}.
8531This will copy the needed files to the @file{m4/} subdirectory automatically,
8532before updating @file{aclocal.m4}.
8533
8534These macros check for the internationalization support functions
8535and related informations.  Hopefully, once stabilized, these macros
8536might be integrated in the standard Autoconf set, because this
8537piece of @code{m4} code will be the same for all projects using GNU
8538@code{gettext}.
8539
8540@node config.h.in
8541@subsection @file{config.h.in} at top level
8542@cindex @file{config.h.in} file
8543
8544The include file template that holds the C macros to be defined by
8545@code{configure} is usually called @file{config.h.in} and may be
8546maintained either manually or automatically.
8547
8548If it is maintained automatically, by use of the @samp{autoheader}
8549program, you need to do nothing about it.  This is the case in particular
8550if you are using GNU @code{automake}.
8551
8552If it is maintained manually, you can get away by adding the
8553following lines to @file{config.h.in}:
8554
8555@example
8556/* Define to 1 if translation of program messages to the user's
8557   native language is requested. */
8558#undef ENABLE_NLS
8559@end example
8560
8561@node Makefile
8562@subsection @file{Makefile.in} at top level
8563
8564Here are a few modifications you need to make to your main, top-level
8565@file{Makefile.in} file.
8566
8567@enumerate
8568@item
8569Add the following lines near the beginning of your @file{Makefile.in},
8570so the @samp{dist:} goal will work properly (as explained further down):
8571
8572@example
8573PACKAGE = @@PACKAGE@@
8574VERSION = @@VERSION@@
8575@end example
8576
8577@item
8578Wherever you process subdirectories in your @file{Makefile.in}, be sure
8579you also process the subdirectory @samp{po}.  Special
8580rules in the @file{Makefiles} take care for the case where no
8581internationalization is wanted.
8582
8583If you are using Makefiles, either generated by automake, or hand-written
8584so they carefully follow the GNU coding standards, the effected goals for
8585which the new subdirectories must be handled include @samp{installdirs},
8586@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
8587
8588Here is an example of a canonical order of processing.  In this
8589example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
8590to be further used in the @samp{dist:} goal.
8591
8592@example
8593SUBDIRS = doc lib src po
8594@end example
8595
8596@item
8597A delicate point is the @samp{dist:} goal, as @file{po/Makefile} will later
8598assume that the proper directory has been set up from the main @file{Makefile}.
8599Here is an example at what the @samp{dist:} goal might look like:
8600
8601@example
8602distdir = $(PACKAGE)-$(VERSION)
8603dist: Makefile
8604	rm -fr $(distdir)
8605	mkdir $(distdir)
8606	chmod 777 $(distdir)
8607	for file in $(DISTFILES); do \
8608	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
8609	done
8610	for subdir in $(SUBDIRS); do \
8611	  mkdir $(distdir)/$$subdir || exit 1; \
8612	  chmod 777 $(distdir)/$$subdir; \
8613	  (cd $$subdir && $(MAKE) $@@) || exit 1; \
8614	done
8615	tar chozf $(distdir).tar.gz $(distdir)
8616	rm -fr $(distdir)
8617@end example
8618
8619@end enumerate
8620
8621Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8622automatically generated from @file{Makefile.am}, and all needed changes
8623to @file{Makefile.am} are already made by running @samp{gettextize}.
8624
8625@node src/Makefile
8626@subsection @file{Makefile.in} in @file{src/}
8627
8628Some of the modifications made in the main @file{Makefile.in} will
8629also be needed in the @file{Makefile.in} from your package sources,
8630which we assume here to be in the @file{src/} subdirectory.  Here are
8631all the modifications needed in @file{src/Makefile.in}:
8632
8633@enumerate
8634@item
8635In view of the @samp{dist:} goal, you should have these lines near the
8636beginning of @file{src/Makefile.in}:
8637
8638@example
8639PACKAGE = @@PACKAGE@@
8640VERSION = @@VERSION@@
8641@end example
8642
8643@item
8644If not done already, you should guarantee that @code{top_srcdir}
8645gets defined.  This will serve for @code{cpp} include files.  Just add
8646the line:
8647
8648@example
8649top_srcdir = @@top_srcdir@@
8650@end example
8651
8652@item
8653You might also want to define @code{subdir} as @samp{src}, later
8654allowing for almost uniform @samp{dist:} goals in all your
8655@file{Makefile.in}.  At list, the @samp{dist:} goal below assume that
8656you used:
8657
8658@example
8659subdir = src
8660@end example
8661
8662@item
8663The @code{main} function of your program will normally call
8664@code{bindtextdomain} (see @pxref{Triggering}), like this:
8665
8666@example
8667bindtextdomain (@var{PACKAGE}, LOCALEDIR);
8668textdomain (@var{PACKAGE});
8669@end example
8670
8671On native Windows platforms, the @code{main} function may call
8672@code{wbindtextdomain} instead of @code{bindtextdomain}.
8673
8674To make LOCALEDIR known to the program, add the following lines to
8675@file{Makefile.in}:
8676
8677@example
8678datadir = @@datadir@@
8679datarootdir= @@datarootdir@@
8680localedir = @@localedir@@
8681DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
8682@end example
8683
8684Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, and
8685@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
8686
8687@item
8688You should ensure that the final linking will use @code{@@LIBINTL@@} or
8689@code{@@LTLIBINTL@@} as a library.  @code{@@LIBINTL@@} is for use without
8690@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}.  An
8691easy way to achieve this is to manage that it gets into @code{LIBS}, like
8692this:
8693
8694@example
8695LIBS = @@LIBINTL@@ @@LIBS@@
8696@end example
8697
8698In most packages internationalized with GNU @code{gettext}, one will
8699find a directory @file{lib/} in which a library containing some helper
8700functions will be build.  (You need at least the few functions which the
8701GNU @code{gettext} Library itself needs.)  However some of the functions
8702in the @file{lib/} also give messages to the user which of course should be
8703translated, too.  Taking care of this, the support library (say
8704@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
8705@code{@@LIBS@@} in the above example.  So one has to write this:
8706
8707@example
8708LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
8709@end example
8710
8711@item
8712Your @samp{dist:} goal has to conform with others.  Here is a
8713reasonable definition for it:
8714
8715@example
8716distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
8717dist: Makefile $(DISTFILES)
8718	for file in $(DISTFILES); do \
8719	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
8720	done
8721@end example
8722
8723@end enumerate
8724
8725Note that if you are using GNU @code{automake}, @file{Makefile.in} is
8726automatically generated from @file{Makefile.am}, and the first three
8727changes and the last change are not necessary.  The remaining needed
8728@file{Makefile.am} modifications are the following:
8729
8730@enumerate
8731@item
8732To make LOCALEDIR known to the program, add the following to
8733@file{Makefile.am}:
8734
8735@example
8736<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8737@end example
8738
8739@noindent
8740for each specific module or compilation unit, or
8741
8742@example
8743AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
8744@end example
8745
8746for all modules and compilation units together.  Furthermore, if you are
8747using an Autoconf version older then 2.60, add this line to define
8748@samp{localedir}:
8749
8750@example
8751localedir = $(datadir)/locale
8752@end example
8753
8754@item
8755To ensure that the final linking will use @code{@@LIBINTL@@} or
8756@code{@@LTLIBINTL@@} as a library, add the following to
8757@file{Makefile.am}:
8758
8759@example
8760<program>_LDADD = @@LIBINTL@@
8761@end example
8762
8763@noindent
8764for each specific program, or
8765
8766@example
8767LDADD = @@LIBINTL@@
8768@end example
8769
8770for all programs together.  Remember that when you use @code{libtool}
8771to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
8772for that program.
8773
8774@end enumerate
8775
8776@node lib/gettext.h
8777@subsection @file{gettext.h} in @file{lib/}
8778@cindex @file{gettext.h} file
8779@cindex turning off NLS support
8780@cindex disabling NLS
8781
8782Internationalization of packages, as provided by GNU @code{gettext}, is
8783optional.  It can be turned off in two situations:
8784
8785@itemize @bullet
8786@item
8787When the installer has specified @samp{./configure --disable-nls}.  This
8788can be useful when small binaries are more important than features, for
8789example when building utilities for boot diskettes.  It can also be useful
8790in order to get some specific C compiler warnings about code quality with
8791some older versions of GCC (older than 3.0).
8792
8793@item
8794When the libintl.h header (with its associated libintl library, if any) is
8795not already installed on the system, it is preferable that the package builds
8796without internationalization support, rather than to give a compilation
8797error.
8798@end itemize
8799
8800A C preprocessor macro can be used to detect these two cases.  Usually,
8801when @code{libintl.h} was found and not explicitly disabled, the
8802@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
8803configuration file (usually called @file{config.h}).  In the two negative
8804situations, however, this macro will not be defined, thus it will evaluate
8805to 0 in C preprocessor expressions.
8806
8807@cindex include file @file{libintl.h}
8808@file{gettext.h} is a convenience header file for conditional use of
8809@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro.  If
8810@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
8811defines no-op substitutes for the libintl.h functions.  We recommend
8812the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
8813so that portability to older systems is guaranteed and installers can
8814turn off internationalization if they want to.  In the C code, you will
8815then write
8816
8817@example
8818#include "gettext.h"
8819@end example
8820
8821@noindent
8822instead of
8823
8824@example
8825#include <libintl.h>
8826@end example
8827
8828The location of @code{gettext.h} is usually in a directory containing
8829auxiliary include files.  In many GNU packages, there is a directory
8830@file{lib/} containing helper functions; @file{gettext.h} fits there.
8831In other packages, it can go into the @file{src} directory.
8832
8833Do not install the @code{gettext.h} file in public locations.  Every
8834package that needs it should contain a copy of it on its own.
8835
8836@node autoconf macros
8837@section Autoconf macros for use in @file{configure.ac}
8838@cindex autoconf macros for @code{gettext}
8839
8840GNU @code{gettext} installs macros for use in a package's
8841@file{configure.ac} or @file{configure.in}.
8842@xref{Top, , Introduction, autoconf, The Autoconf Manual}.
8843The primary macro is, of course, @code{AM_GNU_GETTEXT}.
8844
8845@menu
8846* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
8847* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8848* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8849* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
8850* AM_XGETTEXT_OPTION::          AM_XGETTEXT_OPTION in @file{po.m4}
8851* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
8852@end menu
8853
8854@node AM_GNU_GETTEXT
8855@subsection AM_GNU_GETTEXT in @file{gettext.m4}
8856
8857@amindex AM_GNU_GETTEXT
8858The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
8859function family in either the C library or a separate @code{libintl}
8860library (shared or static libraries are both supported).  It also invokes
8861@code{AM_PO_SUBDIRS}, thus preparing the @file{po/} directories of the
8862package for building.
8863
8864@code{AM_GNU_GETTEXT} accepts up to three optional arguments.  The general
8865syntax is
8866
8867@example
8868AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}])
8869@end example
8870
8871@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
8872@c it is of no use for packages other than GNU gettext itself.  (Such packages
8873@c are not allowed to install the shared libintl.  But if they use libtool,
8874@c then it is in order to install shared libraries that depend on libintl.)
8875@var{intlsymbol} should always be @samp{external}.
8876
8877If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
8878gettext implementations (in libc or libintl) without the @code{ngettext()}
8879function will be ignored.  If @var{needsymbol} is specified and is
8880@samp{need-formatstring-macros}, then GNU gettext implementations that don't
8881support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
8882Only one @var{needsymbol} can be specified.  These requirements can also be
8883specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere.  To specify
8884more than one requirement, just specify the strongest one among them, or
8885invoke the @code{AM_GNU_GETTEXT_NEED} macro several times.  The hierarchy
8886among the various alternatives is as follows: @samp{need-formatstring-macros}
8887implies @samp{need-ngettext}.
8888
8889The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
8890available and should be used.  If so, it sets the @code{USE_NLS} variable
8891to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
8892generated configuration file (usually called @file{config.h}); it sets
8893the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
8894for use in a Makefile (@code{LIBINTL} for use without libtool,
8895@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
8896@code{CPPFLAGS} if necessary.  In the negative case, it sets
8897@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
8898to empty and doesn't change @code{CPPFLAGS}.
8899
8900The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
8901
8902@itemize @bullet
8903@item
8904@cindex @code{libintl} library
8905Some operating systems have @code{gettext} in the C library, for example
8906glibc.  Some have it in a separate library @code{libintl}.  GNU @code{libintl}
8907might have been installed as part of the GNU @code{gettext} package.
8908
8909@item
8910GNU @code{libintl}, if installed, is not necessarily already in the search
8911path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8912the library search path).
8913
8914@item
8915Except for glibc, the operating system's native @code{gettext} cannot
8916exploit the GNU mo files, doesn't have the necessary locale dependency
8917features, and cannot convert messages from the catalog's text encoding
8918to the user's locale encoding.
8919
8920@item
8921GNU @code{libintl}, if installed, is not necessarily already in the
8922run time library search path.  To avoid the need for setting an environment
8923variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8924run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
8925variables.  This works on most systems, but not on some operating systems
8926with limited shared library support, like SCO.
8927
8928@item
8929GNU @code{libintl} relies on POSIX/XSI @code{iconv}.  The macro checks for
8930linker options needed to use iconv and appends them to the @code{LIBINTL}
8931and @code{LTLIBINTL} variables.
8932@end itemize
8933
8934@node AM_GNU_GETTEXT_VERSION
8935@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8936
8937@amindex AM_GNU_GETTEXT_VERSION
8938The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
8939the GNU gettext infrastructure that is used by the package.
8940
8941The use of this macro is optional; only the @code{autopoint} program makes
8942use of it (@pxref{Version Control Issues}).
8943
8944@node AM_GNU_GETTEXT_NEED
8945@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8946
8947@amindex AM_GNU_GETTEXT_NEED
8948The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the
8949GNU gettext implementation.  The syntax is
8950
8951@example
8952AM_GNU_GETTEXT_NEED([@var{needsymbol}])
8953@end example
8954
8955If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations
8956(in libc or libintl) without the @code{ngettext()} function will be ignored.
8957If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext
8958implementations that don't support the ISO C 99 @file{<inttypes.h>}
8959formatstring macros will be ignored.
8960
8961The optional second argument of @code{AM_GNU_GETTEXT} is also taken into
8962account.
8963
8964The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after
8965the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8966
8967@node AM_PO_SUBDIRS
8968@subsection AM_PO_SUBDIRS in @file{po.m4}
8969
8970@amindex AM_PO_SUBDIRS
8971The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
8972package for building.  This macro should be used in internationalized
8973programs written in other programming languages than C, C++, Objective C,
8974for example @code{sh}, @code{Python}, @code{Lisp}.  See @ref{Programming
8975Languages} for a list of programming languages that support localization
8976through PO files.
8977
8978The @code{AM_PO_SUBDIRS} macro determines whether internationalization
8979should be used.  If so, it sets the @code{USE_NLS} variable to @samp{yes},
8980otherwise to @samp{no}.  It also determines the right values for Makefile
8981variables in each @file{po/} directory.
8982
8983@node AM_XGETTEXT_OPTION
8984@subsection AM_XGETTEXT_OPTION in @file{po.m4}
8985
8986@amindex AM_XGETTEXT_OPTION
8987The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be
8988used in the invocations of @code{xgettext} in the @file{po/} directories
8989of the package.
8990
8991For example, if you have a source file that defines a function
8992@samp{error_at_line} whose fifth argument is a format string, you can use
8993@example
8994AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format])
8995@end example
8996@noindent
8997to instruct @code{xgettext} to mark all translatable strings in @samp{gettext}
8998invocations that occur as fifth argument to this function as @samp{c-format}.
8999
9000See @ref{xgettext Invocation} for the list of options that @code{xgettext}
9001accepts.
9002
9003The use of this macro is an alternative to the use of the
9004@samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}.
9005
9006@node AM_ICONV
9007@subsection AM_ICONV in @file{iconv.m4}
9008
9009@amindex AM_ICONV
9010The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
9011@code{iconv} function family in either the C library or a separate
9012@code{libiconv} library.  If found, it sets the @code{am_cv_func_iconv}
9013variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
9014generated configuration file (usually called @file{config.h}); it defines
9015@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
9016second argument of @code{iconv()} is of type @samp{const char **} or
9017@samp{char **}; it sets the variables @code{LIBICONV} and
9018@code{LTLIBICONV} to the linker options for use in a Makefile
9019(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
9020libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
9021necessary.  If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
9022empty and doesn't change @code{CPPFLAGS}.
9023
9024The complexities that @code{AM_ICONV} deals with are the following:
9025
9026@itemize @bullet
9027@item
9028@cindex @code{libiconv} library
9029Some operating systems have @code{iconv} in the C library, for example
9030glibc.  Some have it in a separate library @code{libiconv}, for example
9031OSF/1 or FreeBSD.  Regardless of the operating system, GNU @code{libiconv}
9032might have been installed.  In that case, it should be used instead of the
9033operating system's native @code{iconv}.
9034
9035@item
9036GNU @code{libiconv}, if installed, is not necessarily already in the search
9037path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
9038the library search path).
9039
9040@item
9041GNU @code{libiconv} is binary incompatible with some operating system's
9042native @code{iconv}, for example on FreeBSD.  Use of an @file{iconv.h}
9043and @file{libiconv.so} that don't fit together would produce program
9044crashes.
9045
9046@item
9047GNU @code{libiconv}, if installed, is not necessarily already in the
9048run time library search path.  To avoid the need for setting an environment
9049variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
9050run time search path options to the @code{LIBICONV} variable.  This works
9051on most systems, but not on some operating systems with limited shared
9052library support, like SCO.
9053@end itemize
9054
9055@file{iconv.m4} is distributed with the GNU gettext package because
9056@file{gettext.m4} relies on it.
9057
9058@node Version Control Issues
9059@section Integrating with Version Control Systems
9060
9061Many projects use version control systems for distributed development
9062and source backup.  This section gives some advice how to manage the
9063uses of @code{gettextize}, @code{autopoint} and @code{autoconf} on
9064version controlled files.
9065
9066@menu
9067* Distributed Development::     Avoiding version mismatch in distributed development
9068* Files under Version Control::  Files to put under version control
9069* Translations under Version Control::  Put PO Files under Version Control
9070* autopoint Invocation::        Invoking the @code{autopoint} Program
9071@end menu
9072
9073@node Distributed Development
9074@subsection Avoiding version mismatch in distributed development
9075
9076In a project development with multiple developers, there should be a
9077single developer who occasionally - when there is desire to upgrade to
9078a new @code{gettext} version - runs @code{gettextize} and performs the
9079changes listed in @ref{Adjusting Files}, and then commits his changes
9080to the repository.
9081
9082It is highly recommended that all developers on a project use the same
9083version of GNU @code{gettext} in the package.  In other words, if a
9084developer runs @code{gettextize}, he should go the whole way, make the
9085necessary remaining changes and commit his changes to the repository.
9086Otherwise the following damages will likely occur:
9087
9088@itemize @bullet
9089@item
9090Apparent version mismatch between developers.  Since some @code{gettext}
9091specific portions in @file{configure.ac}, @file{configure.in} and
9092@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
9093version, the use of infrastructure files belonging to different
9094@code{gettext} versions can easily lead to build errors.
9095
9096@item
9097Hidden version mismatch.  Such version mismatch can also lead to
9098malfunctioning of the package, that may be undiscovered by the developers.
9099The worst case of hidden version mismatch is that internationalization
9100of the package doesn't work at all.
9101
9102@item
9103Release risks.  All developers implicitly perform constant testing on
9104a package.  This is important in the days and weeks before a release.
9105If the guy who makes the release tar files uses a different version
9106of GNU @code{gettext} than the other developers, the distribution will
9107be less well tested than if all had been using the same @code{gettext}
9108version.  For example, it is possible that a platform specific bug goes
9109undiscovered due to this constellation.
9110@end itemize
9111
9112@node Files under Version Control
9113@subsection Files to put under version control
9114
9115There are basically three ways to deal with generated files in the
9116context of a version controlled repository, such as @file{configure}
9117generated from @file{configure.ac}, @code{@var{parser}.c} generated
9118from @code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled
9119by @code{gettextize} or @code{autopoint}.
9120
9121@enumerate
9122@item
9123All generated files are always committed into the repository.
9124
9125@item
9126All generated files are committed into the repository occasionally,
9127for example each time a release is made.
9128
9129@item
9130Generated files are never committed into the repository.
9131@end enumerate
9132
9133Each of these three approaches has different advantages and drawbacks.
9134
9135@enumerate
9136@item
9137The advantage is that anyone can check out the source at any moment and
9138gets a working build.  The drawbacks are:  1a. It requires some frequent
9139"push" actions by the maintainers.  1b. The repository grows in size
9140quite fast.
9141
9142@item
9143The advantage is that anyone can check out the source, and the usual
9144"./configure; make" will work.  The drawbacks are: 2a. The one who
9145checks out the repository needs tools like GNU @code{automake}, GNU
9146@code{autoconf}, GNU @code{m4} installed in his PATH; sometimes he
9147even needs particular versions of them.  2b. When a release is made
9148and a commit is made on the generated files, the other developers get
9149conflicts on the generated files when merging the local work back to
9150the repository.  Although these conflicts are easy to resolve, they
9151are annoying.
9152
9153@item
9154The advantage is less work for the maintainers.  The drawback is that
9155anyone who checks out the source not only needs tools like GNU
9156@code{automake}, GNU @code{autoconf}, GNU @code{m4} installed in his
9157PATH, but also that he needs to perform a package specific pre-build
9158step before being able to "./configure; make".
9159@end enumerate
9160
9161For the first and second approach, all files modified or brought in
9162by the occasional @code{gettextize} invocation and update should be
9163committed into the repository.
9164
9165For the third approach, the maintainer can omit from the repository
9166all the files that @code{gettextize} mentions as "copy".  Instead, he
9167adds to the @file{configure.ac} or @file{configure.in} a line of the
9168form
9169
9170@example
9171AM_GNU_GETTEXT_VERSION(@value{ARCHIVE-VERSION})
9172@end example
9173
9174@noindent
9175and adds to the package's pre-build script an invocation of
9176@samp{autopoint}.  For everyone who checks out the source, this
9177@code{autopoint} invocation will copy into the right place the
9178@code{gettext} infrastructure files that have been omitted from the repository.
9179
9180The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
9181the version of the @code{gettext} infrastructure that the package wants
9182to use.  It is also the minimum version number of the @samp{autopoint}
9183program.  So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
9184developers can have any version >= 0.11.5 installed; the package will work
9185with the 0.11.5 infrastructure in all developers' builds.  When the
9186maintainer then runs gettextize from, say, version 0.12.1 on the package,
9187the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
9188into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
9189use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
9190installed.
9191
9192@node Translations under Version Control
9193@subsection Put PO Files under Version Control
9194
9195Since translations are valuable assets as well as the source code, it
9196would make sense to put them under version control.  The GNU gettext
9197infrastructure supports two ways to deal with translations in the
9198context of a version controlled repository.
9199
9200@enumerate
9201@item
9202Both POT file and PO files are committed into the repository.
9203
9204@item
9205Only PO files are committed into the repository.
9206
9207@end enumerate
9208
9209If a POT file is absent when building, it will be generated by
9210scanning the source files with @code{xgettext}, and then the PO files
9211are regenerated as a dependency.  On the other hand, some maintainers
9212want to keep the POT file unchanged during the development phase.  So,
9213even if a POT file is present and older than the source code, it won't
9214be updated automatically.  You can manually update it with @code{make
9215$(DOMAIN).pot-update}, and commit it at certain point.
9216
9217Special advices for particular version control systems:
9218
9219@itemize @bullet
9220@item
9221Recent version control systems, Git for instance, ignore file's
9222timestamp.  In that case, PO files can be accidentally updated even if
9223a POT file is not updated.  To prevent this, you can set
9224@samp{PO_DEPENDS_ON_POT} variable to @code{no} in the @file{Makevars}
9225file and do @code{make update-po} manually.
9226
9227@item
9228Location comments such as @code{#: lib/error.c:116} are sometimes
9229annoying, since these comments are volatile and may introduce unwanted
9230change to the working copy when building.  To mitigate this, you can
9231decide to omit those comments from the PO files in the repository.
9232
9233This is possible with the @code{--no-location} option of the
9234@code{msgmerge} command @footnote{you can also use it through the
9235@samp{MSGMERGE_OPTIONS} option from @file{Makevars}}.  The drawback is
9236that, if the location information is needed, translators have to
9237recover the location comments by running @code{msgmerge} again.
9238
9239@end itemize
9240
9241@node autopoint Invocation
9242@subsection Invoking the @code{autopoint} Program
9243
9244@include autopoint.texi
9245
9246@node Release Management
9247@section Creating a Distribution Tarball
9248
9249@cindex release
9250@cindex distribution tarball
9251In projects that use GNU @code{automake}, the usual commands for creating
9252a distribution tarball, @samp{make dist} or @samp{make distcheck},
9253automatically update the PO files as needed.
9254
9255If GNU @code{automake} is not used, the maintainer needs to perform this
9256update before making a release:
9257
9258@example
9259$ ./configure
9260$ (cd po; make update-po)
9261$ make distclean
9262@end example
9263
9264@node Installers
9265@chapter The Installer's and Distributor's View
9266@cindex package installer's view of @code{gettext}
9267@cindex package distributor's view of @code{gettext}
9268@cindex package build and installation options
9269@cindex setting up @code{gettext} at build time
9270
9271By default, packages fully using GNU @code{gettext}, internally,
9272are installed in such a way as to allow translation of
9273messages.  At @emph{configuration} time, those packages should
9274automatically detect whether the underlying host system already provides
9275the GNU @code{gettext} functions.  If not,
9276the GNU @code{gettext} library should be automatically prepared
9277and used.  Installers may use special options at configuration
9278time for changing this behavior.  The command @samp{./configure
9279--with-included-gettext} bypasses system @code{gettext} to
9280use the included GNU @code{gettext} instead,
9281while @samp{./configure --disable-nls}
9282produces programs totally unable to translate messages.
9283
9284@vindex LINGUAS@r{, environment variable}
9285Internationalized packages have usually many @file{@var{ll}.po}
9286files.  Unless
9287translations are disabled, all those available are installed together
9288with the package.  However, the environment variable @code{LINGUAS}
9289may be set, prior to configuration, to limit the installed set.
9290@code{LINGUAS} should then contain a space separated list of two-letter
9291codes, stating which languages are allowed.
9292
9293@node Programming Languages
9294@chapter Other Programming Languages
9295
9296While the presentation of @code{gettext} focuses mostly on C and
9297implicitly applies to C++ as well, its scope is far broader than that:
9298Many programming languages, scripting languages and other textual data
9299like GUI resources or package descriptions can make use of the gettext
9300approach.
9301
9302@menu
9303* Language Implementors::       The Language Implementor's View
9304* Programmers for other Languages::  The Programmer's View
9305* Translators for other Languages::  The Translator's View
9306* Maintainers for other Languages::  The Maintainer's View
9307* List of Programming Languages::  Individual Programming Languages
9308@end menu
9309
9310@node Language Implementors
9311@section The Language Implementor's View
9312@cindex programming languages
9313@cindex scripting languages
9314
9315All programming and scripting languages that have the notion of strings
9316are eligible to supporting @code{gettext}.  Supporting @code{gettext}
9317means the following:
9318
9319@enumerate
9320@item
9321You should add to the language a syntax for translatable strings.  In
9322principle, a function call of @code{gettext} would do, but a shorthand
9323syntax helps keeping the legibility of internationalized programs.  For
9324example, in C we use the syntax @code{_("string")}, and in GNU awk we use
9325the shorthand @code{_"string"}.
9326
9327@item
9328You should arrange that evaluation of such a translatable string at
9329runtime calls the @code{gettext} function, or performs equivalent
9330processing.
9331
9332@item
9333Similarly, you should make the functions @code{ngettext},
9334@code{dcgettext}, @code{dcngettext} available from within the language.
9335These functions are less often used, but are nevertheless necessary for
9336particular purposes: @code{ngettext} for correct plural handling, and
9337@code{dcgettext} and @code{dcngettext} for obeying other locale-related
9338environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
9339@code{LC_MONETARY}.  For these latter functions, you need to make the
9340@code{LC_*} constants, available in the C header @code{<locale.h>},
9341referenceable from within the language, usually either as enumeration
9342values or as strings.
9343
9344@item
9345You should allow the programmer to designate a message domain, either by
9346making the @code{textdomain} function available from within the
9347language, or by introducing a magic variable called @code{TEXTDOMAIN}.
9348Similarly, you should allow the programmer to designate where to search
9349for message catalogs, by providing access to the @code{bindtextdomain}
9350function or --- on native Windows platforms --- to the @code{wbindtextdomain}
9351function.
9352
9353@item
9354You should either perform a @code{setlocale (LC_ALL, "")} call during
9355the startup of your language runtime, or allow the programmer to do so.
9356Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
9357@code{LC_CTYPE} locale categories are not both set.
9358
9359@item
9360A programmer should have a way to extract translatable strings from a
9361program into a PO file.  The GNU @code{xgettext} program is being
9362extended to support very different programming languages.  Please
9363contact the GNU @code{gettext} maintainers to help them doing this.
9364The GNU @code{gettext} maintainers will need from you a formal
9365description of the lexical structure of source files.  It should
9366answer the questions:
9367@itemize @bullet
9368@item
9369What does a token look like?
9370@item
9371What does a string literal look like? What escape characters exist
9372inside a string?
9373@item
9374What escape characters exist outside of strings?  If Unicode escapes
9375are supported, are they applied before or after tokenization?
9376@item
9377What is the syntax for function calls?  How are consecutive arguments
9378in the same function call separated?
9379@item
9380What is the syntax for comments?
9381@end itemize
9382@noindent Based on this description, the GNU @code{gettext} maintainers
9383can add support to @code{xgettext}.
9384
9385If the string extractor is best integrated into your language's parser,
9386GNU @code{xgettext} can function as a front end to your string extractor.
9387
9388@item
9389The language's library should have a string formatting facility.
9390Additionally:
9391@enumerate
9392@item
9393There must be a way, in the format string, to denote the arguments by a
9394positional number or a name.  This is needed because for some languages
9395and some messages with more than one substitutable argument, the
9396translation will need to output the substituted arguments in different
9397order.  @xref{c-format Flag}.
9398@item
9399The syntax of format strings must be documented in a way that translators
9400can understand.  The GNU @code{gettext} manual will be extended to
9401include a pointer to this documentation.
9402@end enumerate
9403Based on this, the GNU @code{gettext} maintainers can add a format string
9404equivalence checker to @code{msgfmt}, so that translators get told
9405immediately when they have made a mistake during the translation of a
9406format string.
9407
9408@item
9409If the language has more than one implementation, and not all of the
9410implementations use @code{gettext}, but the programs should be portable
9411across implementations, you should provide a no-i18n emulation, that
9412makes the other implementations accept programs written for yours,
9413without actually translating the strings.
9414
9415@item
9416To help the programmer in the task of marking translatable strings,
9417which is sometimes performed using the Emacs PO mode (@pxref{Marking}),
9418you are welcome to
9419contact the GNU @code{gettext} maintainers, so they can add support for
9420your language to @file{po-mode.el}.
9421@end enumerate
9422
9423On the implementation side, two approaches are possible, with
9424different effects on portability and copyright:
9425
9426@itemize @bullet
9427@item
9428You may link against GNU @code{gettext} functions if they are found in
9429the C library.  For example, an autoconf test for @code{gettext()} and
9430@code{ngettext()} will detect this situation.  For the moment, this test
9431will succeed on GNU systems and on Solaris 11 platforms.  No severe
9432copyright restrictions apply, except if you want to distribute statically
9433linked binaries.
9434
9435@item
9436You may emulate or reimplement the GNU @code{gettext} functionality.
9437This has the advantage of full portability and no copyright
9438restrictions, but also the drawback that you have to reimplement the GNU
9439@code{gettext} features (such as the @code{LANGUAGE} environment
9440variable, the locale aliases database, the automatic charset conversion,
9441and plural handling).
9442@end itemize
9443
9444@node Programmers for other Languages
9445@section The Programmer's View
9446
9447For the programmer, the general procedure is the same as for the C
9448language.  The Emacs PO mode marking supports other languages, and the GNU
9449@code{xgettext} string extractor recognizes other languages based on the
9450file extension or a command-line option.  In some languages,
9451@code{setlocale} is not needed because it is already performed by the
9452underlying language runtime.
9453
9454@node Translators for other Languages
9455@section The Translator's View
9456
9457The translator works exactly as in the C language case.  The only
9458difference is that when translating format strings, she has to be aware
9459of the language's particular syntax for positional arguments in format
9460strings.
9461
9462@menu
9463* c-format::                    C Format Strings
9464* objc-format::                 Objective C Format Strings
9465* python-format::               Python Format Strings
9466* java-format::                 Java Format Strings
9467* csharp-format::               C# Format Strings
9468* javascript-format::           JavaScript Format Strings
9469* scheme-format::               Scheme Format Strings
9470* lisp-format::                 Lisp Format Strings
9471* elisp-format::                Emacs Lisp Format Strings
9472* librep-format::               librep Format Strings
9473* ruby-format::                 Ruby Format Strings
9474* sh-format::                   Shell Format Strings
9475* awk-format::                  awk Format Strings
9476* lua-format::                  Lua Format Strings
9477* object-pascal-format::        Object Pascal Format Strings
9478* smalltalk-format::            Smalltalk Format Strings
9479* qt-format::                   Qt Format Strings
9480* qt-plural-format::            Qt Plural Format Strings
9481* kde-format::                  KDE Format Strings
9482* kde-kuit-format::             KUIT Format Strings
9483* boost-format::                Boost Format Strings
9484* tcl-format::                  Tcl Format Strings
9485* perl-format::                 Perl Format Strings
9486* php-format::                  PHP Format Strings
9487* gcc-internal-format::         GCC internal Format Strings
9488* gfc-internal-format::         GFC internal Format Strings
9489* ycp-format::                  YCP Format Strings
9490@end menu
9491
9492@node c-format
9493@subsection C Format Strings
9494
9495C format strings are described in POSIX (IEEE P1003.1 2001), section
9496XSH 3 fprintf(),
9497@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
9498See also the fprintf() manual page,
9499@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
9500@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
9501
9502Although format strings with positions that reorder arguments, such as
9503
9504@example
9505"Only %2$d bytes free on '%1$s'."
9506@end example
9507
9508@noindent
9509which is semantically equivalent to
9510
9511@example
9512"'%s' has only %d bytes free."
9513@end example
9514
9515@noindent
9516are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
9517on this reordering ability: On the few platforms where @code{printf()},
9518@code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
9519or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
9520activates these replacement functions automatically.
9521
9522@cindex outdigits
9523@cindex Arabic digits
9524As a special feature for Farsi (Persian) and maybe Arabic, translators can
9525insert an @samp{I} flag into numeric format directives.  For example, the
9526translation of @code{"%d"} can be @code{"%Id"}.  The effect of this flag,
9527on systems with GNU @code{libc}, is that in the output, the ASCII digits are
9528replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
9529category.  On other systems, the @code{gettext} function removes this flag,
9530so that it has no effect.
9531
9532Note that the programmer should @emph{not} put this flag into the
9533untranslated string.  (Putting the @samp{I} format directive flag into an
9534@var{msgid} string would lead to undefined behaviour on platforms without
9535glibc when NLS is disabled.)
9536
9537@node objc-format
9538@subsection Objective C Format Strings
9539
9540Objective C format strings are like C format strings.  They support an
9541additional format directive: "%@@", which when executed consumes an argument
9542of type @code{Object *}.
9543
9544@node python-format
9545@subsection Python Format Strings
9546
9547There are two kinds of format strings in Python: those acceptable to
9548the Python built-in format operator @code{%}, labelled as
9549@samp{python-format}, and those acceptable to the @code{format} method
9550of the @samp{str} object.
9551
9552Python @code{%} format strings are described in
9553@w{Python Library reference} /
9554@w{5. Built-in Types} /
9555@w{5.6. Sequence Types} /
9556@w{5.6.2. String Formatting Operations}.
9557@uref{https://docs.python.org/2/library/stdtypes.html#string-formatting-operations}.
9558
9559Python brace format strings are described in @w{PEP 3101 -- Advanced
9560String Formatting}, @uref{https://www.python.org/dev/peps/pep-3101/}.
9561
9562@node java-format
9563@subsection Java Format Strings
9564
9565There are two kinds of format strings in Java: those acceptable to the
9566@code{MessageFormat.format} function, labelled as @samp{java-format},
9567and those acceptable to the @code{String.format} and
9568@code{PrintStream.printf} functions, labelled as @samp{java-printf-format}.
9569
9570Java format strings are described in the JDK documentation for class
9571@code{java.text.MessageFormat},
9572@uref{https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html}.
9573See also the ICU documentation
9574@uref{http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html}.
9575
9576Java @code{printf} format strings are described in the JDK documentation
9577for class @code{java.util.Formatter},
9578@uref{https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html}.
9579
9580@node csharp-format
9581@subsection C# Format Strings
9582
9583C# format strings are described in the .NET documentation for class
9584@code{System.String} and in
9585@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
9586
9587@node javascript-format
9588@subsection JavaScript Format Strings
9589
9590Although JavaScript specification itself does not define any format
9591strings, many JavaScript implementations provide printf-like
9592functions.  @code{xgettext} understands a set of common format strings
9593used in popular JavaScript implementations including Gjs, Seed, and
9594Node.JS.  In such a format string, a directive starts with @samp{%}
9595and is finished by a specifier: @samp{%} denotes a literal percent
9596sign, @samp{c} denotes a character, @samp{s} denotes a string,
9597@samp{b}, @samp{d}, @samp{o}, @samp{x}, @samp{X} denote an integer,
9598@samp{f} denotes floating-point number, @samp{j} denotes a JSON
9599object.
9600
9601@node scheme-format
9602@subsection Scheme Format Strings
9603
9604Scheme format strings are documented in the SLIB manual, section
9605@w{Format Specification}.
9606
9607@node lisp-format
9608@subsection Lisp Format Strings
9609
9610Lisp format strings are described in the Common Lisp HyperSpec,
9611chapter 22.3 @w{Formatted Output},
9612@uref{http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_22-3.html}.
9613
9614@node elisp-format
9615@subsection Emacs Lisp Format Strings
9616
9617Emacs Lisp format strings are documented in the Emacs Lisp reference,
9618section @w{Formatting Strings},
9619@uref{https://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
9620Note that as of version 21, XEmacs supports numbered argument specifications
9621in format strings while FSF Emacs doesn't.
9622
9623@node librep-format
9624@subsection librep Format Strings
9625
9626librep format strings are documented in the librep manual, section
9627@w{Formatted Output},
9628@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
9629@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
9630
9631@node ruby-format
9632@subsection Ruby Format Strings
9633
9634Ruby format strings are described in the documentation of the Ruby
9635functions @code{format} and @code{sprintf}, in
9636@uref{https://ruby-doc.org/core-2.7.1/Kernel.html#method-i-sprintf}.
9637
9638There are two kinds of format strings in Ruby:
9639@itemize @bullet
9640@item
9641Those that take a list of arguments without names.  They support
9642argument reordering by use of the @code{%@var{n}$} syntax.  Note
9643that if one argument uses this syntax, all must use this syntax.
9644@item
9645Those that take a hash table, containing named arguments.  The
9646syntax is @code{%<@var{name}>}.  Note that @code{%@{@var{name}@}} is
9647equivalent to @code{%<@var{name}>s}.
9648@end itemize
9649
9650@node sh-format
9651@subsection Shell Format Strings
9652
9653Shell format strings, as supported by GNU gettext and the @samp{envsubst}
9654program, are strings with references to shell variables in the form
9655@code{$@var{variable}} or @code{$@{@var{variable}@}}.  References of the form
9656@code{$@{@var{variable}-@var{default}@}},
9657@code{$@{@var{variable}:-@var{default}@}},
9658@code{$@{@var{variable}=@var{default}@}},
9659@code{$@{@var{variable}:=@var{default}@}},
9660@code{$@{@var{variable}+@var{replacement}@}},
9661@code{$@{@var{variable}:+@var{replacement}@}},
9662@code{$@{@var{variable}?@var{ignored}@}},
9663@code{$@{@var{variable}:?@var{ignored}@}},
9664that would be valid inside shell scripts, are not supported.  The
9665@var{variable} names must consist solely of alphanumeric or underscore
9666ASCII characters, not start with a digit and be nonempty; otherwise such
9667a variable reference is ignored.
9668
9669@node awk-format
9670@subsection awk Format Strings
9671
9672awk format strings are described in the gawk documentation, section
9673@w{Printf},
9674@uref{https://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
9675
9676@node lua-format
9677@subsection Lua Format Strings
9678
9679Lua format strings are described in the Lua reference manual, section @w{String Manipulation},
9680@uref{https://www.lua.org/manual/5.1/manual.html#pdf-string.format}.
9681
9682@node object-pascal-format
9683@subsection Object Pascal Format Strings
9684
9685Object Pascal format strings are described in the documentation of the
9686Free Pascal runtime library, section Format,
9687@uref{https://www.freepascal.org/docs-html/rtl/sysutils/format.html}.
9688
9689@node smalltalk-format
9690@subsection Smalltalk Format Strings
9691
9692Smalltalk format strings are described in the GNU Smalltalk documentation,
9693class @code{CharArray}, methods @samp{bindWith:} and
9694@samp{bindWithArguments:}.
9695@uref{https://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
9696In summary, a directive starts with @samp{%} and is followed by @samp{%}
9697or a nonzero digit (@samp{1} to @samp{9}).
9698
9699@node qt-format
9700@subsection Qt Format Strings
9701
9702Qt format strings are described in the documentation of the QString class
9703@uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}.
9704In summary, a directive consists of a @samp{%} followed by a digit. The same
9705directive cannot occur more than once in a format string.
9706
9707@node qt-plural-format
9708@subsection Qt Format Strings
9709
9710Qt format strings are described in the documentation of the QObject::tr method
9711@uref{file:/usr/lib/qt-4.3.0/doc/html/qobject.html}.
9712In summary, the only allowed directive is @samp{%n}.
9713
9714@node kde-format
9715@subsection KDE Format Strings
9716
9717KDE 4 format strings are defined as follows:
9718A directive consists of a @samp{%} followed by a non-zero decimal number.
9719If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)}
9720must occur as well, except possibly one of them.
9721
9722@node kde-kuit-format
9723@subsection KUIT Format Strings
9724
9725KUIT (KDE User Interface Text) is compatible with KDE 4 format strings,
9726while it also allows programmers to add semantic information to a format
9727string, through XML markup tags.  For example, if the first format
9728directive in a string is a filename, programmers could indicate that
9729with a @samp{filename} tag, like @samp{<filename>%1</filename>}.
9730
9731KUIT format strings are described in
9732@uref{https://api.kde.org/frameworks/ki18n/html/prg_guide.html#kuit_markup}.
9733
9734@node boost-format
9735@subsection Boost Format Strings
9736
9737Boost format strings are described in the documentation of the
9738@code{boost::format} class, at
9739@uref{https://www.boost.org/libs/format/doc/format.html}.
9740In summary, a directive has either the same syntax as in a C format string,
9741such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as
9742@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number
9743between percent signs, such as @samp{%1%}.
9744
9745@node tcl-format
9746@subsection Tcl Format Strings
9747
9748Tcl format strings are described in the @file{format.n} manual page,
9749@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
9750
9751@node perl-format
9752@subsection Perl Format Strings
9753
9754There are two kinds of format strings in Perl: those acceptable to the
9755Perl built-in function @code{printf}, labelled as @samp{perl-format},
9756and those acceptable to the @code{libintl-perl} function @code{__x},
9757labelled as @samp{perl-brace-format}.
9758
9759Perl @code{printf} format strings are described in the @code{sprintf}
9760section of @samp{man perlfunc}.
9761
9762Perl brace format strings are described in the
9763@file{Locale::TextDomain(3pm)} manual page of the CPAN package
9764libintl-perl.  In brief, Perl format uses placeholders put between
9765braces (@samp{@{} and @samp{@}}).  The placeholder must have the syntax
9766of simple identifiers.
9767
9768@node php-format
9769@subsection PHP Format Strings
9770
9771PHP format strings are described in the documentation of the PHP function
9772@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
9773@uref{http://www.php.net/manual/en/function.sprintf.php}.
9774
9775@node gcc-internal-format
9776@subsection GCC internal Format Strings
9777
9778These format strings are used inside the GCC sources.  In such a format
9779string, a directive starts with @samp{%}, is optionally followed by a
9780size specifier @samp{l}, an optional flag @samp{+}, another optional flag
9781@samp{#}, and is finished by a specifier: @samp{%} denotes a literal
9782percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
9783@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
9784denote an unsigned integer, @samp{.*s} denotes a string preceded by a
9785width specification, @samp{H} denotes a @samp{location_t *} pointer,
9786@samp{D} denotes a general declaration, @samp{F} denotes a function
9787declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
9788@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
9789denotes a programming language, @samp{O} denotes a binary operator,
9790@samp{P} denotes a function parameter, @samp{Q} denotes an assignment
9791operator, @samp{V} denotes a const/volatile qualifier.
9792
9793@node gfc-internal-format
9794@subsection GFC internal Format Strings
9795
9796These format strings are used inside the GNU Fortran Compiler sources,
9797that is, the Fortran frontend in the GCC sources.  In such a format
9798string, a directive starts with @samp{%} and is finished by a
9799specifier: @samp{%} denotes a literal percent sign, @samp{C} denotes the
9800current source location, @samp{L} denotes a source location, @samp{c}
9801denotes a character, @samp{s} denotes a string, @samp{i} and @samp{d}
9802denote an integer, @samp{u} denotes an unsigned integer.  @samp{i},
9803@samp{d}, and @samp{u} may be preceded by a size specifier @samp{l}.
9804
9805@node ycp-format
9806@subsection YCP Format Strings
9807
9808YCP sformat strings are described in the libycp documentation
9809@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
9810In summary, a directive starts with @samp{%} and is followed by @samp{%}
9811or a nonzero digit (@samp{1} to @samp{9}).
9812
9813
9814@node Maintainers for other Languages
9815@section The Maintainer's View
9816
9817For the maintainer, the general procedure differs from the C language
9818case:
9819
9820@itemize @bullet
9821@item
9822If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
9823variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
9824match the @code{xgettext} options for that particular programming language.
9825If the package uses more than one programming language with @code{gettext}
9826support, it becomes necessary to change the POT file construction rule
9827in @file{po/Makefile.in.in}.  It is recommended to make one @code{xgettext}
9828invocation per programming language, each with the options appropriate for
9829that language, and to combine the resulting files using @code{msgcat}.
9830@end itemize
9831
9832@node List of Programming Languages
9833@section Individual Programming Languages
9834
9835@c Here is a list of programming languages, as used for Free Software projects
9836@c on SourceForge/Freshmeat, as of February 2002.  Those supported by gettext
9837@c are marked with a star.
9838@c   C                       3580     *
9839@c   Perl                    1911     *
9840@c   C++                     1379     *
9841@c   Java                    1200     *
9842@c   PHP                     1051     *
9843@c   Python                   613     *
9844@c   Unix Shell               357     *
9845@c   Tcl                      266     *
9846@c   SQL                      174
9847@c   JavaScript               118
9848@c   Assembly                 108
9849@c   Scheme                    51
9850@c   Ruby                      47
9851@c   Lisp                      45     *
9852@c   Objective C               39     *
9853@c   PL/SQL                    29
9854@c   Fortran                   25
9855@c   Ada                       24
9856@c   Delphi                    22
9857@c   Awk                       19     *
9858@c   Pascal                    19
9859@c   ML                        19
9860@c   Eiffel                    17
9861@c   Emacs-Lisp                14     *
9862@c   Zope                      14
9863@c   ASP                       12
9864@c   Forth                     12
9865@c   Cold Fusion               10
9866@c   Haskell                    9
9867@c   Visual Basic               9
9868@c   C#                         6     *
9869@c   Smalltalk                  6     *
9870@c   Basic                      5
9871@c   Erlang                     5
9872@c   Modula                     5
9873@c   Object Pascal              5     *
9874@c   Rexx                       5
9875@c   Dylan                      4
9876@c   Prolog                     4
9877@c   APL                        3
9878@c   PROGRESS                   2
9879@c   Euler                      1
9880@c   Euphoria                   1
9881@c   Pliant                     1
9882@c   Simula                     1
9883@c   XBasic                     1
9884@c   Logo                       0
9885@c   Other Scripting Engines   49
9886@c   Other                    116
9887
9888@menu
9889* C::                           C, C++, Objective C
9890* Python::                      Python
9891* Java::                        Java
9892* C#::                          C#
9893* JavaScript::                  JavaScript
9894* Scheme::                      GNU guile - Scheme
9895* Common Lisp::                 GNU clisp - Common Lisp
9896* clisp C::                     GNU clisp C sources
9897* Emacs Lisp::                  Emacs Lisp
9898* librep::                      librep
9899* Ruby::                        Ruby
9900* sh::                          sh - Shell Script
9901* bash::                        bash - Bourne-Again Shell Script
9902* gawk::                        GNU awk
9903* Lua::                         Lua
9904* Pascal::                      Pascal - Free Pascal Compiler
9905* Smalltalk::                   GNU Smalltalk
9906* Vala::                        Vala
9907* wxWidgets::                   wxWidgets library
9908* Tcl::                         Tcl - Tk's scripting language
9909* Perl::                        Perl
9910* PHP::                         PHP Hypertext Preprocessor
9911* Pike::                        Pike
9912* GCC-source::                  GNU Compiler Collection sources
9913* YCP::                         YCP - YaST2 scripting language
9914@end menu
9915
9916@include lang-c.texi
9917@include lang-python.texi
9918@include lang-java.texi
9919@include lang-csharp.texi
9920@include lang-javascript.texi
9921@include lang-scheme.texi
9922@include lang-lisp.texi
9923@include lang-clisp-c.texi
9924@include lang-elisp.texi
9925@include lang-librep.texi
9926@include lang-ruby.texi
9927@include lang-sh.texi
9928@include lang-bash.texi
9929@include lang-gawk.texi
9930@include lang-lua.texi
9931@include lang-pascal.texi
9932@include lang-smalltalk.texi
9933@include lang-vala.texi
9934@include lang-wxwidgets.texi
9935@include lang-tcl.texi
9936@include lang-perl.texi
9937@include lang-php.texi
9938@include lang-pike.texi
9939@include lang-gcc-source.texi
9940@include lang-ycp.texi
9941
9942@c This is the template for new languages.
9943@ignore
9944
9945@ node
9946@ subsection
9947
9948@table @asis
9949@item RPMs
9950
9951@item Ubuntu packages
9952
9953@item File extension
9954
9955@item String syntax
9956
9957@item gettext shorthand
9958
9959@item gettext/ngettext functions
9960
9961@item textdomain
9962
9963@item bindtextdomain
9964
9965@item setlocale
9966
9967@item Prerequisite
9968
9969@item Use or emulate GNU gettext
9970
9971@item Extractor
9972
9973@item Formatting with positions
9974
9975@item Portability
9976
9977@item po-mode marking
9978@end table
9979
9980@end ignore
9981
9982@node Data Formats
9983@chapter Other Data Formats
9984
9985While the GNU gettext tools deal mainly with POT and PO files, they can
9986also manipulate a couple of other data formats.
9987
9988@menu
9989* Internationalizable Data::    Internationalizable Data Formats
9990* Localized Data::              Localized Data Formats
9991@end menu
9992
9993@node Internationalizable Data
9994@section Internationalizable Data Formats
9995
9996Here is a list of other data formats which can be internationalized
9997using GNU gettext.
9998
9999@menu
10000* POT::                         POT - Portable Object Template
10001* RST::                         Resource String Table
10002* Glade::                       Glade - GNOME user interface description
10003* GSettings::                   GSettings - GNOME user configuration schema
10004* AppData::                     AppData - freedesktop.org application description
10005* Preparing ITS Rules::         Preparing Rules for XML Internationalization
10006@end menu
10007
10008@node POT
10009@subsection POT - Portable Object Template
10010
10011@table @asis
10012@item RPMs
10013gettext
10014
10015@item Ubuntu packages
10016gettext
10017
10018@item File extension
10019@code{pot}, @code{po}
10020
10021@item Extractor
10022@code{xgettext}
10023@end table
10024
10025@node RST
10026@subsection Resource String Table
10027@cindex RST
10028@cindex RSJ
10029
10030RST is the format of resource string table files of the Free Pascal compiler
10031versions older than 3.0.0.  RSJ is the new format of resource string table
10032files, created by the Free Pascal compiler version 3.0.0 or newer.
10033
10034@table @asis
10035@item RPMs
10036fpk
10037
10038@item Ubuntu packages
10039fp-compiler
10040
10041@item File extension
10042@code{rst}, @code{rsj}
10043
10044@item Extractor
10045@code{xgettext}, @code{rstconv}
10046@end table
10047
10048@node Glade
10049@subsection Glade - GNOME user interface description
10050
10051@table @asis
10052@item RPMs
10053glade, libglade, glade2, libglade2, intltool
10054
10055@item Ubuntu packages
10056glade, libglade2-dev, intltool
10057
10058@item File extension
10059@code{glade}, @code{glade2}, @code{ui}
10060
10061@item Extractor
10062@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
10063@end table
10064
10065@node GSettings
10066@subsection GSettings - GNOME user configuration schema
10067
10068@table @asis
10069@item RPMs
10070glib2
10071
10072@item Ubuntu packages
10073libglib2.0-dev
10074
10075@item File extension
10076@code{gschema.xml}
10077
10078@item Extractor
10079@code{xgettext}, @code{intltool-extract}
10080@end table
10081
10082@node AppData
10083@subsection AppData - freedesktop.org application description
10084
10085This file format is specified in
10086@url{https://www.freedesktop.org/software/appstream/docs/}.
10087
10088@table @asis
10089@item RPMs
10090appdata-tools, appstream, libappstream-glib, libappstream-glib-builder
10091
10092@item Ubuntu packages
10093appdata-tools, appstream, libappstream-glib-dev
10094
10095@item File extension
10096@code{appdata.xml}, @code{metainfo.xml}
10097
10098@item Extractor
10099@code{xgettext}, @code{intltool-extract}, @code{itstool}
10100@end table
10101
10102@node Preparing ITS Rules
10103@subsection Preparing Rules for XML Internationalization
10104@cindex preparing rules for XML translation
10105
10106Marking translatable strings in an XML file is done through a separate
10107"rule" file, making use of the Internationalization Tag Set standard
10108(ITS, @uref{https://www.w3.org/TR/its20/}).  The currently supported ITS
10109data categories are: @samp{Translate}, @samp{Localization Note},
10110@samp{Elements Within Text}, and @samp{Preserve Space}.  In addition to
10111them, @code{xgettext} also recognizes the following extended data
10112categories:
10113
10114@table @samp
10115@item Context
10116
10117This data category associates @code{msgctxt} to the extracted text.  In
10118the global rule, the @code{contextRule} element contains the following:
10119
10120@itemize
10121@item
10122A required @code{selector} attribute.  It contains an absolute selector
10123that selects the nodes to which this rule applies.
10124
10125@item
10126A required @code{contextPointer} attribute that contains a relative
10127selector pointing to a node that holds the @code{msgctxt} value.
10128
10129@item
10130An optional @code{textPointer} attribute that contains a relative
10131selector pointing to a node that holds the @code{msgid} value.
10132@end itemize
10133
10134@item Escape Special Characters
10135
10136This data category indicates whether the special XML characters
10137(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
10138reference.  In the global rule, the @code{escapeRule} element contains
10139the following:
10140
10141@itemize
10142@item
10143A required @code{selector} attribute.  It contains an absolute selector
10144that selects the nodes to which this rule applies.
10145
10146@item
10147A required @code{escape} attribute with the value @code{yes} or @code{no}.
10148@end itemize
10149
10150@item Extended Preserve Space
10151
10152This data category extends the standard @samp{Preserve Space} data
10153category with the additional values @samp{trim} and @samp{paragraph}.
10154@samp{trim} means to remove the leading and trailing whitespaces of the
10155content, but not to normalize whitespaces in the middle.
10156@samp{paragraph} means to normalize the content but keep the paragraph
10157boundaries.  In the global
10158rule, the @code{preserveSpaceRule} element contains the following:
10159
10160@itemize
10161@item
10162A required @code{selector} attribute.  It contains an absolute selector
10163that selects the nodes to which this rule applies.
10164
10165@item
10166A required @code{space} attribute with the value @code{default},
10167@code{preserve}, @code{trim}, or @code{paragraph}.
10168@end itemize
10169
10170@end table
10171
10172All those extended data categories can only be expressed with global
10173rules, and the rule elements have to have the
10174@code{https://www.gnu.org/s/gettext/ns/its/extensions/1.0} namespace.
10175
10176Given the following XML document in a file @file{messages.xml}:
10177
10178@example
10179<?xml version="1.0"?>
10180<messages>
10181  <message>
10182    <p>A translatable string</p>
10183  </message>
10184  <message>
10185    <p translatable="no">A non-translatable string</p>
10186  </message>
10187</messages>
10188@end example
10189
10190To extract the first text content ("A translatable string"), but not the
10191second ("A non-translatable string"), the following ITS rules can be used:
10192
10193@example
10194<?xml version="1.0"?>
10195<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
10196  <its:translateRule selector="/messages" translate="no"/>
10197  <its:translateRule selector="//message/p" translate="yes"/>
10198
10199  <!-- If 'p' has an attribute 'translatable' with the value 'no', then
10200       the content is not translatable.  -->
10201  <its:translateRule selector="//message/p[@@translatable = 'no']"
10202    translate="no"/>
10203</its:rules>
10204@end example
10205
10206@samp{xgettext} needs another file called "locating rule" to associate
10207an ITS rule with an XML file.  If the above ITS file is saved as
10208@file{messages.its}, the locating rule would look like:
10209
10210@example
10211<?xml version="1.0"?>
10212<locatingRules>
10213  <locatingRule name="Messages" pattern="*.xml">
10214    <documentRule localName="messages" target="messages.its"/>
10215  </locatingRule>
10216  <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
10217</locatingRules>
10218@end example
10219
10220The @code{locatingRule} element must have a @code{pattern} attribute,
10221which denotes either a literal file name or a wildcard pattern of the
10222XML file@footnote{Note that the file name matching is done after
10223removing any @code{.in} suffix from the input file name.  Thus the
10224@code{pattern} attribute must not include a pattern matching @code{.in}.
10225For example, if the input file name is @file{foo.msg.in}, the pattern
10226should be either @code{*.msg} or just @code{*}, rather than
10227@code{*.in}.}.  The @code{locatingRule} element can have child
10228@code{documentRule} element, which adds checks on the content of the XML
10229file.
10230
10231The first rule matches any file with the @file{.xml} file extension, but
10232it only applies to XML files whose root element is @samp{<messages>}.
10233
10234The second rule indicates that the same ITS rule file are also
10235applicable to any file with the @file{.msg} file extension.  The
10236optional @code{name} attribute of @code{locatingRule} allows to choose
10237rules by name, typically with @code{xgettext}'s @code{-L} option.
10238
10239The associated ITS rule file is indicated by the @code{target} attribute
10240of @code{locatingRule} or @code{documentRule}.  If it is specified in a
10241@code{documentRule} element, the parent @code{locatingRule} shouldn't
10242have the @code{target} attribute.
10243
10244Locating rule files must have the @file{.loc} file extension.  Both ITS
10245rule files and locating rule files must be installed in the
10246@file{$prefix/share/gettext/its} directory.  Once those files are
10247properly installed, @code{xgettext} can extract translatable strings
10248from the matching XML files.
10249
10250@subsubsection Two Use-cases of Translated Strings in XML
10251
10252For XML, there are two use-cases of translated strings.  One is the case
10253where the translated strings are directly consumed by programs, and the
10254other is the case where the translated strings are merged back to the
10255original XML document.  In the former case, special characters in the
10256extracted strings shouldn't be escaped, while they should in the latter
10257case.  To control wheter to escape special characters, the @samp{Escape
10258Special Characters} data category can be used.
10259
10260To merge the translations, the @samp{msgfmt} program can be used with
10261the option @code{--xml}.  @xref{msgfmt Invocation}, for more details
10262about how one calls the @samp{msgfmt} program.  @samp{msgfmt}'s
10263@code{--xml} option doesn't perform character escaping, so translated
10264strings can have arbitrary XML constructs, such as elements for markup.
10265
10266@c This is the template for new data formats.
10267@ignore
10268
10269@ node
10270@ subsection
10271
10272@table @asis
10273@item RPMs
10274
10275@item Ubuntu packages
10276
10277@item File extension
10278
10279@item Extractor
10280@end table
10281
10282@end ignore
10283
10284@node Localized Data
10285@section Localized Data Formats
10286
10287Here is a list of file formats that contain localized data and that the
10288GNU gettext tools can manipulate.
10289
10290@menu
10291* Editable Message Catalogs::   Editable Message Catalogs
10292* Compiled Message Catalogs::   Compiled Message Catalogs
10293* Desktop Entry::               Desktop Entry files
10294* XML::                         XML files
10295@end menu
10296
10297@node Editable Message Catalogs
10298@subsection Editable Message Catalogs
10299
10300These file formats can be used with all of the @code{msg*} tools and with
10301the @code{xgettext} program.
10302
10303If you just want to convert among these formats, you can use the
10304@code{msgcat} program (with the appropriate option) or the @code{xgettext}
10305program.
10306
10307@menu
10308* PO::                          PO - Portable Object
10309* Java .properties::            Java .properties
10310* GNUstep .strings::            NeXTstep/GNUstep .strings
10311@end menu
10312
10313@node PO
10314@subsubsection PO - Portable Object
10315
10316@table @asis
10317@item File extension
10318@code{po}
10319@end table
10320
10321@node Java .properties
10322@subsubsection Java .properties
10323
10324@table @asis
10325@item File extension
10326@code{properties}
10327@end table
10328
10329@node GNUstep .strings
10330@subsubsection NeXTstep/GNUstep .strings
10331
10332@table @asis
10333@item File extension
10334@code{strings}
10335@end table
10336
10337@node Compiled Message Catalogs
10338@subsection Compiled Message Catalogs
10339
10340These file formats can be created through @code{msgfmt} and converted back
10341to PO format through @code{msgunfmt}.
10342
10343@menu
10344* MO::                          MO - Machine Object
10345* Java ResourceBundle::         Java ResourceBundle
10346* C# Satellite Assembly::       C# Satellite Assembly
10347* C# Resource::                 C# Resource
10348* Tcl message catalog::         Tcl message catalog
10349* Qt message catalog::          Qt message catalog
10350@end menu
10351
10352@node MO
10353@subsubsection MO - Machine Object
10354
10355@table @asis
10356@item File extension
10357@code{mo}
10358@end table
10359
10360See section @ref{MO Files} for details.
10361
10362@node Java ResourceBundle
10363@subsubsection Java ResourceBundle
10364
10365@table @asis
10366@item File extension
10367@code{class}
10368@end table
10369
10370For more information, see the section @ref{Java} and the examples
10371@code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}.
10372
10373@node C# Satellite Assembly
10374@subsubsection C# Satellite Assembly
10375
10376@table @asis
10377@item File extension
10378@code{dll}
10379@end table
10380
10381For more information, see the section @ref{C#}.
10382
10383@node C# Resource
10384@subsubsection C# Resource
10385
10386@table @asis
10387@item File extension
10388@code{resources}
10389@end table
10390
10391For more information, see the section @ref{C#}.
10392
10393@node Tcl message catalog
10394@subsubsection Tcl message catalog
10395
10396@table @asis
10397@item File extension
10398@code{msg}
10399@end table
10400
10401For more information, see the section @ref{Tcl} and the examples
10402@code{hello-tcl}, @code{hello-tcl-tk}.
10403
10404@node Qt message catalog
10405@subsubsection Qt message catalog
10406
10407@table @asis
10408@item File extension
10409@code{qm}
10410@end table
10411
10412For more information, see the examples @code{hello-c++-qt} and
10413@code{hello-c++-kde}.
10414
10415@node Desktop Entry
10416@subsection Desktop Entry files
10417
10418The programmer produces a desktop entry file template with only the
10419English strings.  These strings get included in the POT file, by way of
10420@code{xgettext} (usually by listing the template in @code{po/POTFILES.in}).
10421The translators produce PO files, one for each language.  Finally, an
10422@code{msgfmt --desktop} invocation collects all the translations in the
10423desktop entry file.
10424
10425For more information, see the example @code{hello-c-gnome3}.
10426
10427@menu
10428* Icons::                       Handling icons
10429@end menu
10430
10431@node Icons
10432@subsubsection How to handle icons in Desktop Entry files
10433
10434Icons are generally locale dependent, for the following reasons:
10435
10436@itemize @bullet
10437@item
10438Icons may contain signs that are considered rude in some cultures.  For
10439example, the high-five sign, in some cultures, is perceived as an
10440unfriendly ``stop'' sign.
10441@item
10442Icons may contain metaphors that are culture specific.  For example, a
10443mailbox in the U.S. looks different than mailboxes all around the world.
10444@item
10445Icons may need to be mirrored for right-to-left locales.
10446@item
10447Icons may contain text strings (a bad practice, but anyway).
10448@end itemize
10449
10450However, icons are not covered by GNU gettext localization, because
10451@itemize @bullet
10452@item
10453Icons cannot be easily embedded in PO files,
10454@item
10455The need to localize an icon is rare, and the ability to do so in a PO
10456file would introduce translator mistakes.
10457@c https://lists.freedesktop.org/archives/xdg/2019-June/014168.html
10458@end itemize
10459
10460Desktop Entry files may contain an @samp{Icon} property, and this
10461property is localizable.  If a translator wishes to localize an icon,
10462she should do so by bypassing the normal workflow with PO files:
10463@enumerate
10464@item
10465The translator contacts the package developers directly, sending them
10466the icon appropriate for her locale, with a request to change the
10467template file.
10468@item
10469The package developers add the icon file to their repository, and a
10470line
10471@smallexample
10472Icon[@var{locale}]=@var{icon_file_name}
10473@end smallexample
10474@noindent
10475to the template file.
10476@end enumerate
10477@noindent
10478This line remains in place when this template file is merged with the
10479translators' PO files, through @code{msgfmt}.
10480
10481@node XML
10482@subsection XML files
10483
10484See the section @ref{Preparing ITS Rules} and
10485@ref{msgfmt Invocation}, subsection ``XML mode operations''.
10486
10487@node Conclusion
10488@chapter Concluding Remarks
10489
10490We would like to conclude this GNU @code{gettext} manual by presenting
10491an history of the Translation Project so far.  We finally give
10492a few pointers for those who want to do further research or readings
10493about Native Language Support matters.
10494
10495@menu
10496* History::                     History of GNU @code{gettext}
10497* The original ABOUT-NLS::      Historical introduction
10498* References::                  Related Readings
10499@end menu
10500
10501@node History
10502@section History of GNU @code{gettext}
10503@cindex history of GNU @code{gettext}
10504
10505Internationalization concerns and algorithms have been informally
10506and casually discussed for years in GNU, sometimes around GNU
10507@code{libc}, maybe around the incoming @code{Hurd}, or otherwise
10508(nobody clearly remembers).  And even then, when the work started for
10509real, this was somewhat independently of these previous discussions.
10510
10511This all began in July 1994, when Patrick D'Cruze had the idea and
10512initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
10513He then asked Jim Meyering, the maintainer, how to get those changes
10514folded into an official release.  That first draft was full of
10515@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
10516nicer ways.  Patrick and Jim shared some tries and experimentations
10517in this area.  Then, feeling that this might eventually have a deeper
10518impact on GNU, Jim wanted to know what standards were, and contacted
10519Richard Stallman, who very quickly and verbally described an overall
10520design for what was meant to become @code{glocale}, at that time.
10521
10522Jim implemented @code{glocale} and got a lot of exhausting feedback
10523from Patrick and Richard, of course, but also from Mitchum DSouza
10524(who wrote a @code{catgets}-like package), Roland McGrath, maybe David
10525MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
10526pulling in various directions, not always compatible, to the extent
10527that after a couple of test releases, @code{glocale} was torn apart.
10528In particular, Paul Eggert -- always keeping an eye on developments
10529in Solaris -- advocated the use of the @code{gettext} API over
10530@code{glocale}'s @code{catgets}-based API.
10531
10532While Jim took some distance and time and became dad for a second
10533time, Roland wanted to get GNU @code{libc} internationalized, and
10534got Ulrich Drepper involved in that project.  Instead of starting
10535from @code{glocale}, Ulrich rewrote something from scratch, but
10536more conforming to the set of guidelines who emerged out of the
10537@code{glocale} effort.  Then, Ulrich got people from the previous
10538forum to involve themselves into this new project, and the switch
10539from @code{glocale} to what was first named @code{msgutils}, renamed
10540@code{nlsutils}, and later @code{gettext}, became officially accepted
10541by Richard in May 1995 or so.
10542
10543Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
10544in April 1995.  The first official release of the package, including
10545PO mode, occurred in July 1995, and was numbered 0.7.  Other people
10546contributed to the effort by providing a discussion forum around
10547Ulrich, writing little pieces of code, or testing.  These are quoted
10548in the @code{THANKS} file which comes with the GNU @code{gettext}
10549distribution.
10550
10551While this was being done, Fran@,{c}ois adapted half a dozen of
10552GNU packages to @code{glocale} first, then later to @code{gettext},
10553putting them in pretest, so providing along the way an effective
10554user environment for fine tuning the evolving tools.  He also took
10555the responsibility of organizing and coordinating the Translation
10556Project.  After nearly a year of informal exchanges between people from
10557many countries, translator teams started to exist in May 1995, through
10558the creation and support by Patrick D'Cruze of twenty unmoderated
10559mailing lists for that many native languages, and two moderated
10560lists: one for reaching all teams at once, the other for reaching
10561all willing maintainers of internationalized free software packages.
10562
10563Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
10564of Greg McGary, as a kind of contribution to Ulrich's package.
10565He also gave a hand with the GNU @code{gettext} Texinfo manual.
10566
10567In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
10568@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
10569
10570In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
10571function) to GNU libc.  Later, in 2001, he released GNU libc 2.2.x,
10572which is the first free C library with full internationalization support.
10573
10574Ulrich being quite busy in his role of General Maintainer of GNU libc,
10575he handed over the GNU @code{gettext} maintenance to Bruno Haible in
105762000.  Bruno added the plural form handling to the tools as well, added
10577support for UTF-8 and CJK locales, and wrote a few new tools for
10578manipulating PO files.
10579
10580@include nls.texi
10581
10582@node References
10583@section Related Readings
10584@cindex related reading
10585@cindex bibliography
10586
10587@strong{ NOTE: } This documentation section is outdated and needs to be
10588revised.
10589
10590Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
10591bibliography on internationalization matters, called
10592@cite{Internationalization Reference List}, which is available as:
10593@example
10594ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
10595@end example
10596
10597Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
10598Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
10599Internationalisation}.  This FAQ discusses writing programs which
10600can handle different language conventions, character sets, etc.;
10601and is applicable to all character set encodings, with particular
10602emphasis on @w{ISO 8859-1}.  It is regularly published in Usenet
10603groups @file{comp.unix.questions}, @file{comp.std.internat},
10604@file{comp.software.international}, @file{comp.lang.c},
10605@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
10606and @file{news.answers}.  The home location of this document is:
10607@example
10608ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
10609@end example
10610
10611Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
10612matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
10613over the responsibility of maintaining it.  It may be found as:
10614@example
10615ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
10616     ...locale-tutorial-0.8.txt.gz
10617@end example
10618@noindent
10619This site is mirrored in:
10620@example
10621ftp://ftp.ibp.fr/pub/linux/sunsite/
10622@end example
10623
10624A French version of the same tutorial should be findable at:
10625@example
10626ftp://ftp.ibp.fr/pub/linux/french/docs/
10627@end example
10628@noindent
10629together with French translations of many Linux-related documents.
10630
10631@node Language Codes
10632@appendix Language Codes
10633@cindex language codes
10634@cindex ISO 639
10635
10636The @w{ISO 639} standard defines two-letter codes for many languages, and
10637three-letter codes for more rarely used languages.
10638All abbreviations for languages used in the Translation Project should
10639come from this standard.
10640
10641@menu
10642* Usual Language Codes::        Two-letter ISO 639 language codes
10643* Rare Language Codes::         Three-letter ISO 639 language codes
10644@end menu
10645
10646@node Usual Language Codes
10647@appendixsec Usual Language Codes
10648
10649For the commonly used languages, the @w{ISO 639-1} standard defines two-letter
10650codes.
10651
10652@table @samp
10653@include iso-639.texi
10654@end table
10655
10656@node Rare Language Codes
10657@appendixsec Rare Language Codes
10658
10659For rarely used languages, the @w{ISO 639-2} standard defines three-letter
10660codes.  Here is the current list, reduced to only living languages with at least
10661one million of speakers.
10662
10663@table @samp
10664@include iso-639-2.texi
10665@end table
10666
10667@node Country Codes
10668@appendix Country Codes
10669@cindex country codes
10670@cindex ISO 3166
10671
10672The @w{ISO 3166} standard defines two character codes for many countries
10673and territories.  All abbreviations for countries used in the Translation
10674Project should come from this standard.
10675
10676@table @samp
10677@include iso-3166.texi
10678@end table
10679
10680@node Licenses
10681@appendix Licenses
10682@cindex Licenses
10683
10684The files of this package are covered by the licenses indicated in each
10685particular file or directory.  Here is a summary:
10686
10687@itemize @bullet
10688@item
10689The @code{libintl} and @code{libasprintf} libraries are covered by the
10690GNU Lesser General Public License (LGPL).
10691A copy of the license is included in @ref{GNU LGPL}.
10692
10693@item
10694The executable programs of this package and the @code{libgettextpo} library
10695are covered by the GNU General Public License (GPL).
10696A copy of the license is included in @ref{GNU GPL}.
10697
10698@item
10699This manual is free documentation.  It is dually licensed under the
10700GNU FDL and the GNU GPL.  This means that you can redistribute this
10701manual under either of these two licenses, at your choice.
10702@*
10703This manual is covered by the GNU FDL.  Permission is granted to copy,
10704distribute and/or modify this document under the terms of the
10705GNU Free Documentation License (FDL), either version 1.2 of the
10706License, or (at your option) any later version published by the
10707Free Software Foundation (FSF); with no Invariant Sections, with no
10708Front-Cover Text, and with no Back-Cover Texts.
10709A copy of the license is included in @ref{GNU FDL}.
10710@*
10711This manual is covered by the GNU GPL.  You can redistribute it and/or
10712modify it under the terms of the GNU General Public License (GPL), either
10713version 2 of the License, or (at your option) any later version published
10714by the Free Software Foundation (FSF).
10715A copy of the license is included in @ref{GNU GPL}.
10716@end itemize
10717
10718@menu
10719* GNU GPL::                     GNU General Public License
10720* GNU LGPL::                    GNU Lesser General Public License
10721* GNU FDL::                     GNU Free Documentation License
10722@end menu
10723
10724@page
10725@node GNU GPL
10726@appendixsec GNU GENERAL PUBLIC LICENSE
10727@cindex GPL, GNU General Public License
10728@cindex License, GNU GPL
10729@include gpl.texi
10730@page
10731@node GNU LGPL
10732@appendixsec GNU LESSER GENERAL PUBLIC LICENSE
10733@cindex LGPL, GNU Lesser General Public License
10734@cindex License, GNU LGPL
10735@include lgpl.texi
10736@page
10737@node GNU FDL
10738@appendixsec GNU Free Documentation License
10739@cindex FDL, GNU Free Documentation License
10740@cindex License, GNU FDL
10741@include fdl.texi
10742
10743@node Program Index
10744@unnumbered Program Index
10745
10746@printindex pg
10747
10748@node Option Index
10749@unnumbered Option Index
10750
10751@printindex op
10752
10753@node Variable Index
10754@unnumbered Variable Index
10755
10756@printindex vr
10757
10758@node PO Mode Index
10759@unnumbered PO Mode Index
10760
10761@printindex em
10762
10763@node Autoconf Macro Index
10764@unnumbered Autoconf Macro Index
10765
10766@printindex am
10767
10768@node Index
10769@unnumbered General Index
10770
10771@printindex cp
10772
10773@bye
10774
10775@c Local variables:
10776@c texinfo-column-for-description: 32
10777@c End:
10778