1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename gettext.info 4@c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo 5@c for info and html output, but to false in texi2html. 6@ifnottex 7@ifclear texi2html 8@set makeinfo 9@end ifclear 10@end ifnottex 11@c The @documentencoding is needed for makeinfo; texi2html 1.52 12@c doesn't recognize it. 13@ifset makeinfo 14@documentencoding UTF-8 15@end ifset 16@settitle GNU @code{gettext} utilities 17@finalout 18@c Indices: 19@c am = autoconf macro @amindex 20@c cp = concept @cindex 21@c ef = emacs function @efindex 22@c em = emacs mode @emindex 23@c ev = emacs variable @evindex 24@c fn = function @findex 25@c kw = keyword @kwindex 26@c op = option @opindex 27@c pg = program @pindex 28@c vr = variable @vindex 29@c Unused predefined indices: 30@c tp = type @tindex 31@c ky = keystroke @kindex 32@defcodeindex am 33@defcodeindex ef 34@defindex em 35@defcodeindex ev 36@defcodeindex kw 37@defcodeindex op 38@syncodeindex ef em 39@syncodeindex ev em 40@syncodeindex fn cp 41@syncodeindex kw cp 42@ifclear texi2html 43@firstparagraphindent insert 44@end ifclear 45@c %**end of header 46 47@include version.texi 48 49@ifinfo 50@dircategory GNU Gettext Utilities 51@direntry 52* gettext: (gettext). GNU gettext utilities. 53* autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. 54* envsubst: (gettext)envsubst Invocation. Expand environment variables. 55* gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. 56* msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. 57* msgcat: (gettext)msgcat Invocation. Combine several PO files. 58* msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. 59* msgcomm: (gettext)msgcomm Invocation. Match two PO files. 60* msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. 61* msgen: (gettext)msgen Invocation. Create an English PO file. 62* msgexec: (gettext)msgexec Invocation. Process a PO file. 63* msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. 64* msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. 65* msggrep: (gettext)msggrep Invocation. Select part of a PO file. 66* msginit: (gettext)msginit Invocation. Create a fresh PO file. 67* msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. 68* msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. 69* msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. 70* ngettext: (gettext)ngettext Invocation. Translate a message with plural. 71* xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. 72* ISO639: (gettext)Language Codes. ISO 639 language codes. 73* ISO3166: (gettext)Country Codes. ISO 3166 country codes. 74@end direntry 75@end ifinfo 76 77@ifinfo 78This file provides documentation for GNU @code{gettext} utilities. 79It also serves as a reference for the free Translation Project. 80 81@copying 82Copyright (C) 1995-1998, 2001-2020 Free Software Foundation, Inc. 83 84This manual is free documentation. It is dually licensed under the 85GNU FDL and the GNU GPL. This means that you can redistribute this 86manual under either of these two licenses, at your choice. 87 88This manual is covered by the GNU FDL. Permission is granted to copy, 89distribute and/or modify this document under the terms of the 90GNU Free Documentation License (FDL), either version 1.2 of the 91License, or (at your option) any later version published by the 92Free Software Foundation (FSF); with no Invariant Sections, with no 93Front-Cover Text, and with no Back-Cover Texts. 94A copy of the license is included in @ref{GNU FDL}. 95 96This manual is covered by the GNU GPL. You can redistribute it and/or 97modify it under the terms of the GNU General Public License (GPL), either 98version 2 of the License, or (at your option) any later version published 99by the Free Software Foundation (FSF). 100A copy of the license is included in @ref{GNU GPL}. 101@end copying 102@end ifinfo 103 104@titlepage 105@title GNU gettext tools, version @value{VERSION} 106@subtitle Native Language Support Library and Tools 107@subtitle Edition @value{EDITION}, @value{UPDATED} 108@author Ulrich Drepper 109@author Jim Meyering 110@author Fran@,{c}ois Pinard 111@author Bruno Haible 112 113@ifnothtml 114@page 115@vskip 0pt plus 1filll 116@c @insertcopying 117Copyright (C) 1995-1998, 2001-2020 Free Software Foundation, Inc. 118 119This manual is free documentation. It is dually licensed under the 120GNU FDL and the GNU GPL. This means that you can redistribute this 121manual under either of these two licenses, at your choice. 122 123This manual is covered by the GNU FDL. Permission is granted to copy, 124distribute and/or modify this document under the terms of the 125GNU Free Documentation License (FDL), either version 1.2 of the 126License, or (at your option) any later version published by the 127Free Software Foundation (FSF); with no Invariant Sections, with no 128Front-Cover Text, and with no Back-Cover Texts. 129A copy of the license is included in @ref{GNU FDL}. 130 131This manual is covered by the GNU GPL. You can redistribute it and/or 132modify it under the terms of the GNU General Public License (GPL), either 133version 2 of the License, or (at your option) any later version published 134by the Free Software Foundation (FSF). 135A copy of the license is included in @ref{GNU GPL}. 136@end ifnothtml 137@end titlepage 138 139@c Table of Contents 140@contents 141 142@ifnottex 143@node Top 144@top GNU @code{gettext} utilities 145 146This manual documents the GNU gettext tools and the GNU libintl library, 147version @value{VERSION}. 148 149@menu 150* Introduction:: Introduction 151* Users:: The User's View 152* PO Files:: The Format of PO Files 153* Sources:: Preparing Program Sources 154* Template:: Making the PO Template File 155* Creating:: Creating a New PO File 156* Updating:: Updating Existing PO Files 157* Editing:: Editing PO Files 158* Manipulating:: Manipulating PO Files 159* Binaries:: Producing Binary MO Files 160* Programmers:: The Programmer's View 161* Translators:: The Translator's View 162* Maintainers:: The Maintainer's View 163* Installers:: The Installer's and Distributor's View 164* Programming Languages:: Other Programming Languages 165* Data Formats:: Other Data Formats 166* Conclusion:: Concluding Remarks 167 168* Language Codes:: ISO 639 language codes 169* Country Codes:: ISO 3166 country codes 170* Licenses:: Licenses 171 172* Program Index:: Index of Programs 173* Option Index:: Index of Command-Line Options 174* Variable Index:: Index of Environment Variables 175* PO Mode Index:: Index of Emacs PO Mode Commands 176* Autoconf Macro Index:: Index of Autoconf Macros 177* Index:: General Index 178 179@detailmenu 180 --- The Detailed Node Listing --- 181 182Introduction 183 184* Why:: The Purpose of GNU @code{gettext} 185* Concepts:: I18n, L10n, and Such 186* Aspects:: Aspects in Native Language Support 187* Files:: Files Conveying Translations 188* Overview:: Overview of GNU @code{gettext} 189 190The User's View 191 192* System Installation:: Questions During Operating System Installation 193* Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs 194* Setting the POSIX Locale:: How to Specify the Locale According to POSIX 195* Working in a Windows console:: Obtaining good output in a Windows console 196* Installing Localizations:: How to Install Additional Translations 197 198Setting the Locale through Environment Variables 199 200* Locale Names:: How a Locale Specification Looks Like 201* Locale Environment Variables:: Which Environment Variable Specfies What 202* The LANGUAGE variable:: How to Specify a Priority List of Languages 203 204Preparing Program Sources 205 206* Importing:: Importing the @code{gettext} declaration 207* Triggering:: Triggering @code{gettext} Operations 208* Preparing Strings:: Preparing Translatable Strings 209* Mark Keywords:: How Marks Appear in Sources 210* Marking:: Marking Translatable Strings 211* c-format Flag:: Telling something about the following string 212* Special cases:: Special Cases of Translatable Strings 213* Bug Report Address:: Letting Users Report Translation Bugs 214* Names:: Marking Proper Names for Translation 215* Libraries:: Preparing Library Sources 216 217Making the PO Template File 218 219* xgettext Invocation:: Invoking the @code{xgettext} Program 220 221Creating a New PO File 222 223* msginit Invocation:: Invoking the @code{msginit} Program 224* Header Entry:: Filling in the Header Entry 225 226Updating Existing PO Files 227 228* msgmerge Invocation:: Invoking the @code{msgmerge} Program 229 230Editing PO Files 231 232* KBabel:: KDE's PO File Editor 233* Gtranslator:: GNOME's PO File Editor 234* PO Mode:: Emacs's PO File Editor 235* Compendium:: Using Translation Compendia 236 237Emacs's PO File Editor 238 239* Installation:: Completing GNU @code{gettext} Installation 240* Main PO Commands:: Main Commands 241* Entry Positioning:: Entry Positioning 242* Normalizing:: Normalizing Strings in Entries 243* Translated Entries:: Translated Entries 244* Fuzzy Entries:: Fuzzy Entries 245* Untranslated Entries:: Untranslated Entries 246* Obsolete Entries:: Obsolete Entries 247* Modifying Translations:: Modifying Translations 248* Modifying Comments:: Modifying Comments 249* Subedit:: Mode for Editing Translations 250* C Sources Context:: C Sources Context 251* Auxiliary:: Consulting Auxiliary PO Files 252 253Using Translation Compendia 254 255* Creating Compendia:: Merging translations for later use 256* Using Compendia:: Using older translations if they fit 257 258Manipulating PO Files 259 260* msgcat Invocation:: Invoking the @code{msgcat} Program 261* msgconv Invocation:: Invoking the @code{msgconv} Program 262* msggrep Invocation:: Invoking the @code{msggrep} Program 263* msgfilter Invocation:: Invoking the @code{msgfilter} Program 264* msguniq Invocation:: Invoking the @code{msguniq} Program 265* msgcomm Invocation:: Invoking the @code{msgcomm} Program 266* msgcmp Invocation:: Invoking the @code{msgcmp} Program 267* msgattrib Invocation:: Invoking the @code{msgattrib} Program 268* msgen Invocation:: Invoking the @code{msgen} Program 269* msgexec Invocation:: Invoking the @code{msgexec} Program 270* Colorizing:: Highlighting parts of PO files 271* Other tools:: Other tools for manipulating PO files 272* libgettextpo:: Writing your own programs that process PO files 273 274Highlighting parts of PO files 275 276* The --color option:: Triggering colorized output 277* The TERM variable:: The environment variable @code{TERM} 278* The --style option:: The @code{--style} option 279* Style rules:: Style rules for PO files 280* Customizing less:: Customizing @code{less} for viewing PO files 281 282Producing Binary MO Files 283 284* msgfmt Invocation:: Invoking the @code{msgfmt} Program 285* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 286* MO Files:: The Format of GNU MO Files 287 288The Programmer's View 289 290* catgets:: About @code{catgets} 291* gettext:: About @code{gettext} 292* Comparison:: Comparing the two interfaces 293* Using libintl.a:: Using libintl.a in own programs 294* gettext grok:: Being a @code{gettext} grok 295* Temp Programmers:: Temporary Notes for the Programmers Chapter 296 297About @code{catgets} 298 299* Interface to catgets:: The interface 300* Problems with catgets:: Problems with the @code{catgets} interface?! 301 302About @code{gettext} 303 304* Interface to gettext:: The interface 305* Ambiguities:: Solving ambiguities 306* Locating Catalogs:: Locating message catalog files 307* Charset conversion:: How to request conversion to Unicode 308* Contexts:: Solving ambiguities in GUI programs 309* Plural forms:: Additional functions for handling plurals 310* Optimized gettext:: Optimization of the *gettext functions 311 312Temporary Notes for the Programmers Chapter 313 314* Temp Implementations:: Temporary - Two Possible Implementations 315* Temp catgets:: Temporary - About @code{catgets} 316* Temp WSI:: Temporary - Why a single implementation 317* Temp Notes:: Temporary - Notes 318 319The Translator's View 320 321* Trans Intro 0:: Introduction 0 322* Trans Intro 1:: Introduction 1 323* Discussions:: Discussions 324* Organization:: Organization 325* Information Flow:: Information Flow 326* Translating plural forms:: How to fill in @code{msgstr[0]}, @code{msgstr[1]} 327* Prioritizing messages:: How to find which messages to translate first 328 329Organization 330 331* Central Coordination:: Central Coordination 332* National Teams:: National Teams 333* Mailing Lists:: Mailing Lists 334 335National Teams 336 337* Sub-Cultures:: Sub-Cultures 338* Organizational Ideas:: Organizational Ideas 339 340The Maintainer's View 341 342* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 343* Prerequisites:: Prerequisite Works 344* gettextize Invocation:: Invoking the @code{gettextize} Program 345* Adjusting Files:: Files You Must Create or Alter 346* autoconf macros:: Autoconf macros for use in @file{configure.ac} 347* Version Control Issues:: 348* Release Management:: Creating a Distribution Tarball 349 350Files You Must Create or Alter 351 352* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 353* po/LINGUAS:: @file{LINGUAS} in @file{po/} 354* po/Makevars:: @file{Makevars} in @file{po/} 355* po/Rules-*:: Extending @file{Makefile} in @file{po/} 356* configure.ac:: @file{configure.ac} at top level 357* config.guess:: @file{config.guess}, @file{config.sub} at top level 358* mkinstalldirs:: @file{mkinstalldirs} at top level 359* aclocal:: @file{aclocal.m4} at top level 360* config.h.in:: @file{config.h.in} at top level 361* Makefile:: @file{Makefile.in} at top level 362* src/Makefile:: @file{Makefile.in} in @file{src/} 363* lib/gettext.h:: @file{gettext.h} in @file{lib/} 364 365Autoconf macros for use in @file{configure.ac} 366 367* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 368* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 369* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 370* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 371* AM_XGETTEXT_OPTION:: AM_XGETTEXT_OPTION in @file{po.m4} 372* AM_ICONV:: AM_ICONV in @file{iconv.m4} 373 374Integrating with Version Control Systems 375 376* Distributed Development:: Avoiding version mismatch in distributed development 377* Files under Version Control:: Files to put under version control 378* Translations under Version Control:: Put PO Files under Version Control 379* autopoint Invocation:: Invoking the @code{autopoint} Program 380 381Other Programming Languages 382 383* Language Implementors:: The Language Implementor's View 384* Programmers for other Languages:: The Programmer's View 385* Translators for other Languages:: The Translator's View 386* Maintainers for other Languages:: The Maintainer's View 387* List of Programming Languages:: Individual Programming Languages 388 389The Translator's View 390 391* c-format:: C Format Strings 392* objc-format:: Objective C Format Strings 393* python-format:: Python Format Strings 394* java-format:: Java Format Strings 395* csharp-format:: C# Format Strings 396* javascript-format:: JavaScript Format Strings 397* scheme-format:: Scheme Format Strings 398* lisp-format:: Lisp Format Strings 399* elisp-format:: Emacs Lisp Format Strings 400* librep-format:: librep Format Strings 401* ruby-format:: Ruby Format Strings 402* sh-format:: Shell Format Strings 403* awk-format:: awk Format Strings 404* lua-format:: Lua Format Strings 405* object-pascal-format:: Object Pascal Format Strings 406* smalltalk-format:: Smalltalk Format Strings 407* qt-format:: Qt Format Strings 408* qt-plural-format:: Qt Plural Format Strings 409* kde-format:: KDE Format Strings 410* kde-kuit-format:: KUIT Format Strings 411* boost-format:: Boost Format Strings 412* tcl-format:: Tcl Format Strings 413* perl-format:: Perl Format Strings 414* php-format:: PHP Format Strings 415* gcc-internal-format:: GCC internal Format Strings 416* gfc-internal-format:: GFC internal Format Strings 417* ycp-format:: YCP Format Strings 418 419Individual Programming Languages 420 421* C:: C, C++, Objective C 422* Python:: Python 423* Java:: Java 424* C#:: C# 425* JavaScript:: JavaScript 426* Scheme:: GNU guile - Scheme 427* Common Lisp:: GNU clisp - Common Lisp 428* clisp C:: GNU clisp C sources 429* Emacs Lisp:: Emacs Lisp 430* librep:: librep 431* Ruby:: Ruby 432* sh:: sh - Shell Script 433* bash:: bash - Bourne-Again Shell Script 434* gawk:: GNU awk 435* Lua:: Lua 436* Pascal:: Pascal - Free Pascal Compiler 437* Smalltalk:: GNU Smalltalk 438* Vala:: Vala 439* wxWidgets:: wxWidgets library 440* Tcl:: Tcl - Tk's scripting language 441* Perl:: Perl 442* PHP:: PHP Hypertext Preprocessor 443* Pike:: Pike 444* GCC-source:: GNU Compiler Collection sources 445* YCP:: YCP - YaST2 scripting language 446 447sh - Shell Script 448 449* Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization 450* gettext.sh:: Contents of @code{gettext.sh} 451* gettext Invocation:: Invoking the @code{gettext} program 452* ngettext Invocation:: Invoking the @code{ngettext} program 453* envsubst Invocation:: Invoking the @code{envsubst} program 454* eval_gettext Invocation:: Invoking the @code{eval_gettext} function 455* eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function 456* eval_pgettext Invocation:: Invoking the @code{eval_pgettext} function 457* eval_npgettext Invocation:: Invoking the @code{eval_npgettext} function 458 459Perl 460 461* General Problems:: General Problems Parsing Perl Code 462* Default Keywords:: Which Keywords Will xgettext Look For? 463* Special Keywords:: How to Extract Hash Keys 464* Quote-like Expressions:: What are Strings And Quote-like Expressions? 465* Interpolation I:: Invalid String Interpolation 466* Interpolation II:: Valid String Interpolation 467* Parentheses:: When To Use Parentheses 468* Long Lines:: How To Grok with Long Lines 469* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work 470 471Other Data Formats 472 473* Internationalizable Data:: Internationalizable Data Formats 474* Localized Data:: Localized Data Formats 475 476Internationalizable Data Formats 477 478* POT:: POT - Portable Object Template 479* RST:: Resource String Table 480* Glade:: Glade - GNOME user interface description 481* GSettings:: GSettings - GNOME user configuration schema 482* AppData:: AppData - freedesktop.org application description 483* Preparing ITS Rules:: Preparing Rules for XML Internationalization 484 485Localized Data Formats 486 487* Editable Message Catalogs:: Editable Message Catalogs 488* Compiled Message Catalogs:: Compiled Message Catalogs 489* Desktop Entry:: Desktop Entry files 490* XML:: XML files 491 492Editable Message Catalogs 493 494* PO:: PO - Portable Object 495* Java .properties:: Java .properties 496* GNUstep .strings:: NeXTstep/GNUstep .strings 497 498Compiled Message Catalogs 499 500* MO:: MO - Machine Object 501* Java ResourceBundle:: Java ResourceBundle 502* C# Satellite Assembly:: C# Satellite Assembly 503* C# Resource:: C# Resource 504* Tcl message catalog:: Tcl message catalog 505* Qt message catalog:: Qt message catalog 506 507Concluding Remarks 508 509* History:: History of GNU @code{gettext} 510* The original ABOUT-NLS:: Historical introduction 511* References:: Related Readings 512 513Language Codes 514 515* Usual Language Codes:: Two-letter ISO 639 language codes 516* Rare Language Codes:: Three-letter ISO 639 language codes 517 518Licenses 519 520* GNU GPL:: GNU General Public License 521* GNU LGPL:: GNU Lesser General Public License 522* GNU FDL:: GNU Free Documentation License 523 524@end detailmenu 525@end menu 526 527@end ifnottex 528 529@node Introduction 530@chapter Introduction 531 532This chapter explains the goals sought in the creation 533of GNU @code{gettext} and the free Translation Project. 534Then, it explains a few broad concepts around 535Native Language Support, and positions message translation with regard 536to other aspects of national and cultural variance, as they apply 537to programs. It also surveys those files used to convey the 538translations. It explains how the various tools interact in the 539initial generation of these files, and later, how the maintenance 540cycle should usually operate. 541 542@cindex sex 543@cindex he, she, and they 544@cindex she, he, and they 545In this manual, we use @emph{he} when speaking of the programmer or 546maintainer, @emph{she} when speaking of the translator, and @emph{they} 547when speaking of the installers or end users of the translated program. 548This is only a convenience for clarifying the documentation. It is 549@emph{absolutely} not meant to imply that some roles are more appropriate 550to males or females. Besides, as you might guess, GNU @code{gettext} 551is meant to be useful for people using computers, whatever their sex, 552race, religion or nationality! 553 554@cindex bug report address 555Please submit suggestions and corrections 556@itemize @bullet 557@item 558either in the bug tracker at @url{https://savannah.gnu.org/projects/gettext} 559@item 560or by email to @code{bug-gettext@@gnu.org}. 561@end itemize 562 563@noindent 564Please include the manual's edition number and update date in your messages. 565 566@menu 567* Why:: The Purpose of GNU @code{gettext} 568* Concepts:: I18n, L10n, and Such 569* Aspects:: Aspects in Native Language Support 570* Files:: Files Conveying Translations 571* Overview:: Overview of GNU @code{gettext} 572@end menu 573 574@node Why 575@section The Purpose of GNU @code{gettext} 576 577Usually, programs are written and documented in English, and use 578English at execution time to interact with users. This is true 579not only of GNU software, but also of a great deal of proprietary 580and free software. Using a common language is quite handy for 581communication between developers, maintainers and users from all 582countries. On the other hand, most people are less comfortable with 583English than with their own native language, and would prefer to 584use their mother tongue for day to day's work, as far as possible. 585Many would simply @emph{love} to see their computer screen showing 586a lot less of English, and far more of their own language. 587 588@cindex Translation Project 589However, to many people, this dream might appear so far fetched that 590they may believe it is not even worth spending time thinking about 591it. They have no confidence at all that the dream might ever 592become true. Yet some have not lost hope, and have organized themselves. 593The Translation Project is a formalization of this hope into a 594workable structure, which has a good chance to get all of us nearer 595the achievement of a truly multi-lingual set of programs. 596 597GNU @code{gettext} is an important step for the Translation Project, 598as it is an asset on which we may build many other steps. This package 599offers to programmers, translators and even users, a well integrated 600set of tools and documentation. Specifically, the GNU @code{gettext} 601utilities are a set of tools that provides a framework within which 602other free packages may produce multi-lingual messages. These tools 603include 604 605@itemize @bullet 606@item 607A set of conventions about how programs should be written to support 608message catalogs. 609 610@item 611A directory and file naming organization for the message catalogs 612themselves. 613 614@item 615A runtime library supporting the retrieval of translated messages. 616 617@item 618A few stand-alone programs to massage in various ways the sets of 619translatable strings, or already translated strings. 620 621@item 622A library supporting the parsing and creation of files containing 623translated messages. 624 625@item 626A special mode for Emacs@footnote{In this manual, all mentions of Emacs 627refers to either GNU Emacs or to XEmacs, which people sometimes call FSF 628Emacs and Lucid Emacs, respectively.} which helps preparing these sets 629and bringing them up to date. 630@end itemize 631 632GNU @code{gettext} is designed to minimize the impact of 633internationalization on program sources, keeping this impact as small 634and hardly noticeable as possible. Internationalization has better 635chances of succeeding if it is very light weighted, or at least, 636appear to be so, when looking at program sources. 637 638The Translation Project also uses the GNU @code{gettext} distribution 639as a vehicle for documenting its structure and methods. This goes 640beyond the strict technicalities of documenting the GNU @code{gettext} 641proper. By so doing, translators will find in a single place, as 642far as possible, all they need to know for properly doing their 643translating work. Also, this supplemental documentation might also 644help programmers, and even curious users, in understanding how GNU 645@code{gettext} is related to the remainder of the Translation 646Project, and consequently, have a glimpse at the @emph{big picture}. 647 648@node Concepts 649@section I18n, L10n, and Such 650 651@cindex i18n 652@cindex l10n 653Two long words appear all the time when we discuss support of native 654language in programs, and these words have a precise meaning, worth 655being explained here, once and for all in this document. The words are 656@emph{internationalization} and @emph{localization}. Many people, 657tired of writing these long words over and over again, took the 658habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first 659and last letter of each word, and replacing the run of intermediate 660letters by a number merely telling how many such letters there are. 661But in this manual, in the sake of clarity, we will patiently write 662the names in full, each time@dots{} 663 664@cindex internationalization 665By @dfn{internationalization}, one refers to the operation by which a 666program, or a set of programs turned into a package, is made aware of and 667able to support multiple languages. This is a generalization process, 668by which the programs are untied from calling only English strings or 669other English specific habits, and connected to generic ways of doing 670the same, instead. Program developers may use various techniques to 671internationalize their programs. Some of these have been standardized. 672GNU @code{gettext} offers one of these standards. @xref{Programmers}. 673 674@cindex localization 675By @dfn{localization}, one means the operation by which, in a set 676of programs already internationalized, one gives the program all 677needed information so that it can adapt itself to handle its input 678and output in a fashion which is correct for some native language and 679cultural habits. This is a particularisation process, by which generic 680methods already implemented in an internationalized program are used 681in specific ways. The programming environment puts several functions 682to the programmers disposal which allow this runtime configuration. 683The formal description of specific set of cultural habits for some 684country, together with all associated translations targeted to the 685same native language, is called the @dfn{locale} for this language 686or country. Users achieve localization of programs by setting proper 687values to special environment variables, prior to executing those 688programs, identifying which locale should be used. 689 690In fact, locale message support is only one component of the cultural 691data that makes up a particular locale. There are a whole host of 692routines and functions provided to aid programmers in developing 693internationalized software and which allow them to access the data 694stored in a particular locale. When someone presently refers to a 695particular locale, they are obviously referring to the data stored 696within that particular locale. Similarly, if a programmer is referring 697to ``accessing the locale routines'', they are referring to the 698complete suite of routines that access all of the locale's information. 699 700@cindex NLS 701@cindex Native Language Support 702@cindex Natural Language Support 703One uses the expression @dfn{Native Language Support}, or merely NLS, 704for speaking of the overall activity or feature encompassing both 705internationalization and localization, allowing for multi-lingual 706interactions in a program. In a nutshell, one could say that 707internationalization is the operation by which further localizations 708are made possible. 709 710Also, very roughly said, when it comes to multi-lingual messages, 711internationalization is usually taken care of by programmers, and 712localization is usually taken care of by translators. 713 714@node Aspects 715@section Aspects in Native Language Support 716 717@cindex translation aspects 718For a totally multi-lingual distribution, there are many things to 719translate beyond output messages. 720 721@itemize @bullet 722@item 723As of today, GNU @code{gettext} offers a complete toolset for 724translating messages output by C programs. Perl scripts and shell 725scripts will also need to be translated. Even if there are today some hooks 726by which this can be done, these hooks are not integrated as well as they 727should be. 728 729@item 730Some programs, like @code{autoconf} or @code{bison}, are able 731to produce other programs (or scripts). Even if the generating 732programs themselves are internationalized, the generated programs they 733produce may need internationalization on their own, and this indirect 734internationalization could be automated right from the generating 735program. In fact, quite usually, generating and generated programs 736could be internationalized independently, as the effort needed is 737fairly orthogonal. 738 739@item 740A few programs include textual tables which might need translation 741themselves, independently of the strings contained in the program 742itself. For example, @w{RFC 1345} gives an English description for each 743character which the @code{recode} program is able to reconstruct at execution. 744Since these descriptions are extracted from the RFC by mechanical means, 745translating them properly would require a prior translation of the RFC 746itself. 747 748@item 749Almost all programs accept options, which are often worded out so to 750be descriptive for the English readers; one might want to consider 751offering translated versions for program options as well. 752 753@item 754Many programs read, interpret, compile, or are somewhat driven by 755input files which are texts containing keywords, identifiers, or 756replies which are inherently translatable. For example, one may want 757@code{gcc} to allow diacriticized characters in identifiers or use 758translated keywords; @samp{rm -i} might accept something else than 759@samp{y} or @samp{n} for replies, etc. Even if the program will 760eventually make most of its output in the foreign languages, one has 761to decide whether the input syntax, option values, etc., are to be 762localized or not. 763 764@item 765The manual accompanying a package, as well as all documentation files 766in the distribution, could surely be translated, too. Translating a 767manual, with the intent of later keeping up with updates, is a major 768undertaking in itself, generally. 769 770@end itemize 771 772As we already stressed, translation is only one aspect of locales. 773Other internationalization aspects are system services and are handled 774in GNU @code{libc}. There 775are many attributes that are needed to define a country's cultural 776conventions. These attributes include beside the country's native 777language, the formatting of the date and time, the representation of 778numbers, the symbols for currency, etc. These local @dfn{rules} are 779termed the country's locale. The locale represents the knowledge 780needed to support the country's native attributes. 781 782@cindex locale categories 783There are a few major areas which may vary between countries and 784hence, define what a locale must describe. The following list helps 785putting multi-lingual messages into the proper context of other tasks 786related to locales. See the GNU @code{libc} manual for details. 787 788@table @emph 789 790@item Characters and Codesets 791@cindex codeset 792@cindex encoding 793@cindex character encoding 794@cindex locale category, LC_CTYPE 795 796The codeset most commonly used through out the USA and most English 797speaking parts of the world is the ASCII codeset. However, there are 798many characters needed by various locales that are not found within 799this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special 800characters needed to handle the major European languages. However, in 801many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it 802doesn't even handle the major European currency. Hence each locale 803will need to specify which codeset they need to use and will need 804to have the appropriate character handling routines to cope with 805the codeset. 806 807@item Currency 808@cindex currency symbols 809@cindex locale category, LC_MONETARY 810 811The symbols used vary from country to country as does the position 812used by the symbol. Software needs to be able to transparently 813display currency figures in the native mode for each locale. 814 815@item Dates 816@cindex date format 817@cindex locale category, LC_TIME 818 819The format of date varies between locales. For example, Christmas day 820in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. 821Other countries might use @w{ISO 8601} dates, etc. 822 823Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm}, 824or otherwise. Some locales require time to be specified in 24-hour 825mode rather than as AM or PM. Further, the nature and yearly extent 826of the Daylight Saving correction vary widely between countries. 827 828@item Numbers 829@cindex number format 830@cindex locale category, LC_NUMERIC 831 832Numbers can be represented differently in different locales. 833For example, the following numbers are all written correctly for 834their respective locales: 835 836@example 83712,345.67 English 83812.345,67 German 839 12345,67 French 8401,2345.67 Asia 841@end example 842 843Some programs could go further and use different unit systems, like 844English units or Metric units, or even take into account variants 845about how numbers are spelled in full. 846 847@item Messages 848@cindex messages 849@cindex locale category, LC_MESSAGES 850 851The most obvious area is the language support within a locale. This is 852where GNU @code{gettext} provides the means for developers and users to 853easily change the language that the software uses to communicate to 854the user. 855 856@end table 857 858@cindex locale categories 859These areas of cultural conventions are called @emph{locale categories}. 860It is an unfortunate term; @emph{locale aspects} or @emph{locale feature 861categories} would be a better term, because each ``locale category'' 862describes an area or task that requires localization. The concrete data 863that describes the cultural conventions for such an area and for a particular 864culture is also called a @emph{locale category}. In this sense, a locale 865is composed of several locale categories: the locale category describing 866the codeset, the locale category describing the formatting of numbers, 867the locale category containing the translated messages, and so on. 868 869@cindex Linux 870Components of locale outside of message handling are standardized in 871the ISO C standard and the POSIX:2001 standard (also known as the SUSV3 872specification). GNU @code{libc} 873fully implements this, and most other modern systems provide a more 874or less reasonable support for at least some of the missing components. 875 876@node Files 877@section Files Conveying Translations 878 879@cindex files, @file{.po} and @file{.mo} 880The letters PO in @file{.po} files means Portable Object, to 881distinguish it from @file{.mo} files, where MO stands for Machine 882Object. This paradigm, as well as the PO file format, is inspired 883by the NLS standard developed by Uniforum, and first implemented by 884Sun in their Solaris system. 885 886PO files are meant to be read and edited by humans, and associate each 887original, translatable string of a given package with its translation 888in a particular target language. A single PO file is dedicated to 889a single target language. If a package supports many languages, 890there is one such PO file per language supported, and each package 891has its own set of PO files. These PO files are best created by 892the @code{xgettext} program, and later updated or refreshed through 893the @code{msgmerge} program. Program @code{xgettext} extracts all 894marked messages from a set of C files and initializes a PO file with 895empty translations. Program @code{msgmerge} takes care of adjusting 896PO files between releases of the corresponding sources, commenting 897obsolete entries, initializing new ones, and updating all source 898line references. Files ending with @file{.pot} are kind of base 899translation files found in distributions, in PO file format. 900 901MO files are meant to be read by programs, and are binary in nature. 902A few systems already offer tools for creating and handling MO files 903as part of the Native Language Support coming with the system, but the 904format of these MO files is often different from system to system, 905and non-portable. The tools already provided with these systems don't 906support all the features of GNU @code{gettext}. Therefore GNU 907@code{gettext} uses its own format for MO files. Files ending with 908@file{.gmo} are really MO files, when it is known that these files use 909the GNU format. 910 911@node Overview 912@section Overview of GNU @code{gettext} 913 914@cindex overview of @code{gettext} 915@cindex big picture 916@cindex tutorial of @code{gettext} usage 917The following diagram summarizes the relation between the files 918handled by GNU @code{gettext} and the tools acting on these files. 919It is followed by somewhat detailed explanations, which you should 920read while keeping an eye on the diagram. Having a clear understanding 921of these interrelations will surely help programmers, translators 922and maintainers. 923 924@ifhtml 925@example 926@group 927Original C Sources ───> Preparation ───> Marked C Sources ───╮ 928 │ 929 ╭─────────<─── GNU gettext Library │ 930╭─── make <───┤ │ 931│ ╰─────────<────────────────────┬───────────────╯ 932│ │ 933│ ╭─────<─── PACKAGE.pot <─── xgettext <───╯ ╭───<─── PO Compendium 934│ │ │ ↑ 935│ │ ╰───╮ │ 936│ ╰───╮ ├───> PO editor ───╮ 937│ ├────> msgmerge ──────> LANG.po ────>────────╯ │ 938│ ╭───╯ │ 939│ │ │ 940│ ╰─────────────<───────────────╮ │ 941│ ├─── New LANG.po <────────────────────╯ 942│ ╭─── LANG.gmo <─── msgfmt <───╯ 943│ │ 944│ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮ 945│ ├───> "Hello world!" 946╰───────> install ───> /.../bin/PROGRAM ───────╯ 947@end group 948@end example 949@end ifhtml 950@ifnothtml 951@example 952@group 953Original C Sources ---> Preparation ---> Marked C Sources ---. 954 | 955 .---------<--- GNU gettext Library | 956.--- make <---+ | 957| `---------<--------------------+---------------' 958| | 959| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium 960| | | ^ 961| | `---. | 962| `---. +---> PO editor ---. 963| +----> msgmerge ------> LANG.po ---->--------' | 964| .---' | 965| | | 966| `-------------<---------------. | 967| +--- New LANG.po <--------------------' 968| .--- LANG.gmo <--- msgfmt <---' 969| | 970| `---> install ---> /.../LANG/PACKAGE.mo ---. 971| +---> "Hello world!" 972`-------> install ---> /.../bin/PROGRAM -------' 973@end group 974@end example 975@end ifnothtml 976 977@cindex marking translatable strings 978As a programmer, the first step to bringing GNU @code{gettext} 979into your package is identifying, right in the C sources, those strings 980which are meant to be translatable, and those which are untranslatable. 981This tedious job can be done a little more comfortably using emacs PO 982mode, but you can use any means familiar to you for modifying your 983C sources. Beside this some other simple, standard changes are needed to 984properly initialize the translation library. @xref{Sources}, for 985more information about all this. 986 987For newly written software the strings of course can and should be 988marked while writing it. The @code{gettext} approach makes this 989very easy. Simply put the following lines at the beginning of each file 990or in a central header file: 991 992@example 993@group 994#define _(String) (String) 995#define N_(String) String 996#define textdomain(Domain) 997#define bindtextdomain(Package, Directory) 998@end group 999@end example 1000 1001@noindent 1002Doing this allows you to prepare the sources for internationalization. 1003Later when you feel ready for the step to use the @code{gettext} library 1004simply replace these definitions by the following: 1005 1006@cindex include file @file{libintl.h} 1007@example 1008@group 1009#include <libintl.h> 1010#define _(String) gettext (String) 1011#define gettext_noop(String) String 1012#define N_(String) gettext_noop (String) 1013@end group 1014@end example 1015 1016@cindex link with @file{libintl} 1017@cindex Linux 1018@noindent 1019and link against @file{libintl.a} or @file{libintl.so}. Note that on 1020GNU systems, you don't need to link with @code{libintl} because the 1021@code{gettext} library functions are already contained in GNU libc. 1022That is all you have to change. 1023 1024@cindex template PO file 1025@cindex files, @file{.pot} 1026Once the C sources have been modified, the @code{xgettext} program 1027is used to find and extract all translatable strings, and create a 1028PO template file out of all these. This @file{@var{package}.pot} file 1029contains all original program strings. It has sets of pointers to 1030exactly where in C sources each string is used. All translations 1031are set to empty. The letter @code{t} in @file{.pot} marks this as 1032a Template PO file, not yet oriented towards any particular language. 1033@xref{xgettext Invocation}, for more details about how one calls the 1034@code{xgettext} program. If you are @emph{really} lazy, you might 1035be interested at working a lot more right away, and preparing the 1036whole distribution setup (@pxref{Maintainers}). By doing so, you 1037spare yourself typing the @code{xgettext} command, as @code{make} 1038should now generate the proper things automatically for you! 1039 1040The first time through, there is no @file{@var{lang}.po} yet, so the 1041@code{msgmerge} step may be skipped and replaced by a mere copy of 1042@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang} 1043represents the target language. See @ref{Creating} for details. 1044 1045Then comes the initial translation of messages. Translation in 1046itself is a whole matter, still exclusively meant for humans, 1047and whose complexity far overwhelms the level of this manual. 1048Nevertheless, a few hints are given in some other chapter of this 1049manual (@pxref{Translators}). You will also find there indications 1050about how to contact translating teams, or becoming part of them, 1051for sharing your translating concerns with others who target the same 1052native language. 1053 1054While adding the translated messages into the @file{@var{lang}.po} 1055PO file, if you are not using one of the dedicated PO file editors 1056(@pxref{Editing}), you are on your own 1057for ensuring that your efforts fully respect the PO file format, and quoting 1058conventions (@pxref{PO Files}). This is surely not an impossible task, 1059as this is the way many people have handled PO files around 1995. 1060On the other hand, by using a PO file editor, most details 1061of PO file format are taken care of for you, but you have to acquire 1062some familiarity with PO file editor itself. 1063 1064If some common translations have already been saved into a compendium 1065PO file, translators may use PO mode for initializing untranslated 1066entries from the compendium, and also save selected translations into 1067the compendium, updating it (@pxref{Compendium}). Compendium files 1068are meant to be exchanged between members of a given translation team. 1069 1070Programs, or packages of programs, are dynamic in nature: users write 1071bug reports and suggestion for improvements, maintainers react by 1072modifying programs in various ways. The fact that a package has 1073already been internationalized should not make maintainers shy 1074of adding new strings, or modifying strings already translated. 1075They just do their job the best they can. For the Translation 1076Project to work smoothly, it is important that maintainers do not 1077carry translation concerns on their already loaded shoulders, and that 1078translators be kept as free as possible of programming concerns. 1079 1080The only concern maintainers should have is carefully marking new 1081strings as translatable, when they should be, and do not otherwise 1082worry about them being translated, as this will come in proper time. 1083Consequently, when programs and their strings are adjusted in various 1084ways by maintainers, and for matters usually unrelated to translation, 1085@code{xgettext} would construct @file{@var{package}.pot} files which are 1086evolving over time, so the translations carried by @file{@var{lang}.po} 1087are slowly fading out of date. 1088 1089@cindex evolution of packages 1090It is important for translators (and even maintainers) to understand 1091that package translation is a continuous process in the lifetime of a 1092package, and not something which is done once and for all at the start. 1093After an initial burst of translation activity for a given package, 1094interventions are needed once in a while, because here and there, 1095translated entries become obsolete, and new untranslated entries 1096appear, needing translation. 1097 1098The @code{msgmerge} program has the purpose of refreshing an already 1099existing @file{@var{lang}.po} file, by comparing it with a newer 1100@file{@var{package}.pot} template file, extracted by @code{xgettext} 1101out of recent C sources. The refreshing operation adjusts all 1102references to C source locations for strings, since these strings 1103move as programs are modified. Also, @code{msgmerge} comments out as 1104obsolete, in @file{@var{lang}.po}, those already translated entries 1105which are no longer used in the program sources (@pxref{Obsolete 1106Entries}). It finally discovers new strings and inserts them in 1107the resulting PO file as untranslated entries (@pxref{Untranslated 1108Entries}). @xref{msgmerge Invocation}, for more information about what 1109@code{msgmerge} really does. 1110 1111Whatever route or means taken, the goal is to obtain an updated 1112@file{@var{lang}.po} file offering translations for all strings. 1113 1114The temporal mobility, or fluidity of PO files, is an integral part of 1115the translation game, and should be well understood, and accepted. 1116People resisting it will have a hard time participating in the 1117Translation Project, or will give a hard time to other participants! In 1118particular, maintainers should relax and include all available official 1119PO files in their distributions, even if these have not recently been 1120updated, without exerting pressure on the translator teams to get the 1121job done. The pressure should rather come 1122from the community of users speaking a particular language, and 1123maintainers should consider themselves fairly relieved of any concern 1124about the adequacy of translation files. On the other hand, translators 1125should reasonably try updating the PO files they are responsible for, 1126while the package is undergoing pretest, prior to an official 1127distribution. 1128 1129Once the PO file is complete and dependable, the @code{msgfmt} program 1130is used for turning the PO file into a machine-oriented format, which 1131may yield efficient retrieval of translations by the programs of the 1132package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt 1133Invocation}, for more information about all modes of execution 1134for the @code{msgfmt} program. 1135 1136Finally, the modified and marked C sources are compiled and linked 1137with the GNU @code{gettext} library, usually through the operation of 1138@code{make}, given a suitable @file{Makefile} exists for the project, 1139and the resulting executable is installed somewhere users will find it. 1140The MO files themselves should also be properly installed. Given the 1141appropriate environment variables are set (@pxref{Setting the POSIX Locale}), 1142the program should localize itself automatically, whenever it executes. 1143 1144The remainder of this manual has the purpose of explaining in depth the various 1145steps outlined above. 1146 1147@node Users 1148@chapter The User's View 1149 1150Nowadays, when users log into a computer, they usually find that all 1151their programs show messages in their native language -- at least for 1152users of languages with an active free software community, like French or 1153German; to a lesser extent for languages with a smaller participation in 1154free software and the GNU project, like Hindi and Filipino. 1155 1156How does this work? How can the user influence the language that is used 1157by the programs? This chapter will answer it. 1158 1159@menu 1160* System Installation:: Questions During Operating System Installation 1161* Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs 1162* Setting the POSIX Locale:: How to Specify the Locale According to POSIX 1163* Working in a Windows console:: Obtaining good output in a Windows console 1164* Installing Localizations:: How to Install Additional Translations 1165@end menu 1166 1167@node System Installation 1168@section Operating System Installation 1169 1170The default language is often already specified during operating system 1171installation. When the operating system is installed, the installer 1172typically asks for the language used for the installation process and, 1173separately, for the language to use in the installed system. Some OS 1174installers only ask for the language once. 1175 1176This determines the system-wide default language for all users. But the 1177installers often give the possibility to install extra localizations for 1178additional languages. For example, the localizations of KDE (the K 1179Desktop Environment) and OpenOffice.org are often bundled separately, 1180as one installable package per language. 1181 1182At this point it is good to consider the intended use of the machine: If 1183it is a machine designated for personal use, additional localizations are 1184probably not necessary. If, however, the machine is in use in an 1185organization or company that has international relationships, one can 1186consider the needs of guest users. If you have a guest from abroad, for 1187a week, what could be his preferred locales? It may be worth installing 1188these additional localizations ahead of time, since they cost only a bit 1189of disk space at this point. 1190 1191The system-wide default language is the locale configuration that is used 1192when a new user account is created. But the user can have his own locale 1193configuration that is different from the one of the other users of the 1194same machine. He can specify it, typically after the first login, as 1195described in the next section. 1196 1197@node Setting the GUI Locale 1198@section Setting the Locale Used by GUI Programs 1199 1200The immediately available programs in a user's desktop come from a group 1201of programs called a ``desktop environment''; it usually includes the window 1202manager, a web browser, a text editor, and more. The most common free 1203desktop environments are KDE, GNOME, and Xfce. 1204 1205The locale used by GUI programs of the desktop environment can be specified 1206in a configuration screen called ``control center'', ``language settings'' 1207or ``country settings''. 1208 1209Individual GUI programs that are not part of the desktop environment can 1210have their locale specified either in a settings panel, or through environment 1211variables. 1212 1213For some programs, it is possible to specify the locale through environment 1214variables, possibly even to a different locale than the desktop's locale. 1215This means, instead of starting a program through a menu or from the file 1216system, you can start it from the command-line, after having set some 1217environment variables. The environment variables can be those specified 1218in the next section (@ref{Setting the POSIX Locale}); for some versions of 1219KDE, however, the locale is specified through a variable @code{KDE_LANG}, 1220rather than @code{LANG} or @code{LC_ALL}. 1221 1222@node Setting the POSIX Locale 1223@section Setting the Locale through Environment Variables 1224 1225As a user, if your language has been installed for this package, in the 1226simplest case, you only have to set the @code{LANG} environment variable 1227to the appropriate @samp{@var{ll}_@var{CC}} combination. For example, 1228let's suppose that you speak German and live in Germany. At the shell 1229prompt, merely execute 1230@w{@samp{setenv LANG de_DE}} (in @code{csh}), 1231@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or 1232@w{@samp{export LANG=de_DE}} (in @code{bash}). This can be done from your 1233@file{.login} or @file{.profile} file, once and for all. 1234 1235@menu 1236* Locale Names:: How a Locale Specification Looks Like 1237* Locale Environment Variables:: Which Environment Variable Specfies What 1238* The LANGUAGE variable:: How to Specify a Priority List of Languages 1239@end menu 1240 1241@node Locale Names 1242@subsection Locale Names 1243 1244A locale name usually has the form @samp{@var{ll}_@var{CC}}. Here 1245@samp{@var{ll}} is an @w{ISO 639} two-letter language code, and 1246@samp{@var{CC}} is an @w{ISO 3166} two-letter country code. For example, 1247for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}. 1248You find a list of the language codes in appendix @ref{Language Codes} and 1249a list of the country codes in appendix @ref{Country Codes}. 1250 1251You might think that the country code specification is redundant. But in 1252fact, some languages have dialects in different countries. For example, 1253@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil. The country 1254code serves to distinguish the dialects. 1255 1256Many locale names have an extended syntax 1257@samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character 1258encoding. These are in use because between 2000 and 2005, most users have 1259switched to locales in UTF-8 encoding. For example, the German locale on 1260glibc systems is nowadays @samp{de_DE.UTF-8}. The older name @samp{de_DE} 1261still refers to the German locale as of 2000 that stores characters in 1262ISO-8859-1 encoding -- a text encoding that cannot even accommodate the Euro 1263currency sign. 1264 1265Some locale names use @samp{@var{ll}_@var{CC}@@@var{variant}} instead of 1266@samp{@var{ll}_@var{CC}}. The @samp{@@@var{variant}} can denote any kind of 1267characteristics that is not already implied by the language @var{ll} and 1268the country @var{CC}. It can denote a particular monetary unit. For example, 1269on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro 1270currency, in contrast to the older locale @samp{de_DE} which implies the use 1271of the currency before 2002. It can also denote a dialect of the language, 1272or the script used to write text (for example, @samp{sr_RS@@latin} uses the 1273Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian), 1274or the orthography rules, or similar. 1275 1276On other systems, some variations of this scheme are used, such as 1277@samp{@var{ll}}. You can get the list of locales supported by your system 1278for your language by running the command @samp{locale -a | grep '^@var{ll}'}. 1279 1280There is also a special locale, called @samp{C}. 1281@c Don't mention that this locale also has the name "POSIX". When we talk about 1282@c the "POSIX locale", we mean the "locale as specified in the POSIX way", and 1283@c mentioning a locale called "POSIX" would bring total confusion. 1284When it is used, it disables all localization: in this locale, all programs 1285standardized by POSIX use English messages and an unspecified character 1286encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on 1287the operating system). 1288 1289@node Locale Environment Variables 1290@subsection Locale Environment Variables 1291@cindex setting up @code{gettext} at run time 1292@cindex selecting message language 1293@cindex language selection 1294 1295A locale is composed of several @emph{locale categories}, see @ref{Aspects}. 1296When a program looks up locale dependent values, it does this according to 1297the following environment variables, in priority order: 1298 1299@enumerate 1300@vindex LANGUAGE@r{, environment variable} 1301@item @code{LANGUAGE} 1302@vindex LC_ALL@r{, environment variable} 1303@item @code{LC_ALL} 1304@vindex LC_CTYPE@r{, environment variable} 1305@vindex LC_NUMERIC@r{, environment variable} 1306@vindex LC_TIME@r{, environment variable} 1307@vindex LC_COLLATE@r{, environment variable} 1308@vindex LC_MONETARY@r{, environment variable} 1309@vindex LC_MESSAGES@r{, environment variable} 1310@item @code{LC_xxx}, according to selected locale category: 1311@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE}, 1312@code{LC_MONETARY}, @code{LC_MESSAGES}, ... 1313@vindex LANG@r{, environment variable} 1314@item @code{LANG} 1315@end enumerate 1316 1317Variables whose value is set but is empty are ignored in this lookup. 1318 1319@code{LANG} is the normal environment variable for specifying a locale. 1320As a user, you normally set this variable (unless some of the other variables 1321have already been set by the system, in @file{/etc/profile} or similar 1322initialization files). 1323 1324@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE}, 1325@code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment 1326variables meant to override @code{LANG} and affecting a single locale 1327category only. For example, assume you are a Swedish user in Spain, and you 1328want your programs to handle numbers and dates according to Spanish 1329conventions, and only the messages should be in Swedish. Then you could 1330create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the 1331@code{localedef} program. But it is simpler, and achieves the same effect, 1332to set the @code{LANG} variable to @code{es_ES.UTF-8} and the 1333@code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come 1334already preinstalled with the operating system. 1335 1336@code{LC_ALL} is an environment variable that overrides all of these. 1337It is typically used in scripts that run particular programs. For example, 1338@code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make 1339sure that the configuration tests don't operate in locale dependent ways. 1340 1341Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in 1342similar initialization files. As a user, you therefore have to unset this 1343variable if you want to set @code{LANG} and optionally some of the other 1344@code{LC_xxx} variables. 1345 1346The @code{LANGUAGE} variable is described in the next subsection. 1347 1348@node The LANGUAGE variable 1349@subsection Specifying a Priority List of Languages 1350 1351Not all programs have translations for all languages. By default, an 1352English message is shown in place of a nonexistent translation. If you 1353understand other languages, you can set up a priority list of languages. 1354This is done through a different environment variable, called 1355@code{LANGUAGE}. GNU @code{gettext} gives preference to @code{LANGUAGE} 1356over @code{LC_ALL} and @code{LANG} for the purpose of message handling, 1357but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary 1358language; this is required by other parts of the system libraries. 1359For example, some Swedish users who would rather read translations in 1360German than English for when Swedish is not available, set @code{LANGUAGE} 1361to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}. 1362 1363Special advice for Norwegian users: The language code for Norwegian 1364bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003). 1365During the transition period, while some message catalogs for this language 1366are installed under @samp{nb} and some older ones under @samp{no}, it is 1367recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that 1368both newer and older translations are used. 1369 1370In the @code{LANGUAGE} environment variable, but not in the other 1371environment variables, @samp{@var{ll}_@var{CC}} combinations can be 1372abbreviated as @samp{@var{ll}} to denote the language's main dialect. 1373For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in 1374Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal) 1375in this context. 1376 1377Note: The variable @code{LANGUAGE} is ignored if the locale is set to 1378@samp{C}. In other words, you have to first enable localization, by setting 1379@code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can 1380use a language priority list through the @code{LANGUAGE} variable. 1381 1382@node Working in a Windows console 1383@section Obtaining good output in a Windows console 1384@cindex Windows 1385@cindex ANSI encoding 1386@cindex OEM encoding 1387@vindex OUTPUT_CHARSET@r{, environment variable} 1388 1389On Windows, consoles such as the one started by the @code{cmd.exe} 1390program do input and output in an encoding, called ``OEM code page'', 1391that is different from the encoding that text-mode programs usually use, 1392called ``ANSI code page''. (Note: This problem does not exist for 1393Cygwin consoles; these consoles do input and output in the UTF-8 1394encoding.) As a workaround, you may request that the programs produce 1395output in this ``OEM'' encoding. To do so, set the environment variable 1396@code{OUTPUT_CHARSET} to the ``OEM'' encoding, through a command such as 1397@smallexample 1398set OUTPUT_CHARSET=CP850 1399@end smallexample 1400Note: This has an effect only on strings looked up in message catalogs; 1401other categories of text are usually not affected by this setting. 1402Note also that this environment variable also affects output sent to a 1403file or to a pipe; output to a file is most often expected to be in the 1404``ANSI'' or in the UTF-8 encoding. 1405 1406Here are examples of the ``ANSI'' and ``OEM'' code pages: 1407 1408@multitable @columnfractions .5 .25 .25 1409@headitem Territories @tie{} @tab @tie{} ANSI encoding @tie{} @tab @tie{} OEM encoding 1410@item Western Europe @tie{} @tab @tie{} CP1252 @tie{} @tab @tie{} CP850 1411@item Slavic countries (Latin 2) @tie{} @tab @tie{} CP1250 @tie{} @tab @tie{} CP852 1412@item Baltic countries @tie{} @tab @tie{} CP1257 @tie{} @tab @tie{} CP775 1413@item Russia @tie{} @tab @tie{} CP1251 @tie{} @tab @tie{} CP866 1414@end multitable 1415 1416@node Installing Localizations 1417@section Installing Translations for Particular Programs 1418@cindex Translation Matrix 1419@cindex available translations 1420 1421Languages are not equally well supported in all packages using GNU 1422@code{gettext}, and more translations are added over time. Usually, you 1423use the translations that are shipped with the operating system 1424or with particular packages that you install afterwards. But you can also 1425install newer localizations directly. For doing this, you will need an 1426understanding where each localization file is stored on the file system. 1427 1428@cindex @file{ABOUT-NLS} file 1429For programs that participate in the Translation Project, you can start 1430looking for translations here: 1431@url{https://translationproject.org/team/index.html}. 1432 1433For programs that are part of the KDE project, the starting point is: 1434@url{https://l10n.kde.org/}. 1435 1436For programs that are part of the GNOME project, the starting point is: 1437@url{https://wiki.gnome.org/TranslationProject}. 1438 1439For other programs, you may check whether the program's source code package 1440contains some @file{@var{ll}.po} files; often they are kept together in a 1441directory called @file{po/}. Each @file{@var{ll}.po} file contains the 1442message translations for the language whose abbreviation of @var{ll}. 1443 1444@node PO Files 1445@chapter The Format of PO Files 1446@cindex PO files' format 1447@cindex file format, @file{.po} 1448 1449The GNU @code{gettext} toolset helps programmers and translators 1450at producing, updating and using translation files, mainly those 1451PO files which are textual, editable files. This chapter explains 1452the format of PO files. 1453 1454A PO file is made up of many entries, each entry holding the relation 1455between an original untranslated string and its corresponding 1456translation. All entries in a given PO file usually pertain 1457to a single project, and all translations are expressed in a single 1458target language. One PO file @dfn{entry} has the following schematic 1459structure: 1460 1461@example 1462@var{white-space} 1463# @var{translator-comments} 1464#. @var{extracted-comments} 1465#: @var{reference}@dots{} 1466#, @var{flag}@dots{} 1467#| msgid @var{previous-untranslated-string} 1468msgid @var{untranslated-string} 1469msgstr @var{translated-string} 1470@end example 1471 1472The general structure of a PO file should be well understood by 1473the translator. When using PO mode, very little has to be known 1474about the format details, as PO mode takes care of them for her. 1475 1476A simple entry can look like this: 1477 1478@example 1479#: lib/error.c:116 1480msgid "Unknown system error" 1481msgstr "Error desconegut del sistema" 1482@end example 1483 1484@cindex comments, translator 1485@cindex comments, automatic 1486@cindex comments, extracted 1487Entries begin with some optional white space. Usually, when generated 1488through GNU @code{gettext} tools, there is exactly one blank line 1489between entries. Then comments follow, on lines all starting with the 1490character @code{#}. There are two kinds of comments: those which have 1491some white space immediately following the @code{#} - the @var{translator 1492comments} -, which comments are created and maintained exclusively by the 1493translator, and those which have some non-white character just after the 1494@code{#} - the @var{automatic comments} -, which comments are created and 1495maintained automatically by GNU @code{gettext} tools. Comment lines 1496starting with @code{#.} contain comments given by the programmer, directed 1497at the translator; these comments are called @var{extracted comments} 1498because the @code{xgettext} program extracts them from the program's 1499source code. Comment lines starting with @code{#:} contain references to 1500the program's source code. Comment lines starting with @code{#,} contain 1501flags; more about these below. Comment lines starting with @code{#|} 1502contain the previous untranslated string for which the translator gave 1503a translation. 1504 1505All comments, of either kind, are optional. 1506 1507@kwindex msgid 1508@kwindex msgstr 1509After white space and comments, entries show two strings, namely 1510first the untranslated string as it appears in the original program 1511sources, and then, the translation of this string. The original 1512string is introduced by the keyword @code{msgid}, and the translation, 1513by @code{msgstr}. The two strings, untranslated and translated, 1514are quoted in various ways in the PO file, using @code{"} 1515delimiters and @code{\} escapes, but the translator does not really 1516have to pay attention to the precise quoting format, as PO mode fully 1517takes care of quoting for her. 1518 1519The @code{msgid} strings, as well as automatic comments, are produced 1520and managed by other GNU @code{gettext} tools, and PO mode does not 1521provide means for the translator to alter these. The most she can 1522do is merely deleting them, and only by deleting the whole entry. 1523On the other hand, the @code{msgstr} string, as well as translator 1524comments, are really meant for the translator, and PO mode gives her 1525the full control she needs. 1526 1527The comment lines beginning with @code{#,} are special because they are 1528not completely ignored by the programs as comments generally are. The 1529comma separated list of @var{flag}s is used by the @code{msgfmt} 1530program to give the user some better diagnostic messages. Currently 1531there are two forms of flags defined: 1532 1533@table @code 1534@item fuzzy 1535@kwindex fuzzy@r{ flag} 1536This flag can be generated by the @code{msgmerge} program or it can be 1537inserted by the translator herself. It shows that the @code{msgstr} 1538string might not be a correct translation (anymore). Only the translator 1539can judge if the translation requires further modification, or is 1540acceptable as is. Once satisfied with the translation, she then removes 1541this @code{fuzzy} attribute. The @code{msgmerge} program inserts this 1542when it combined the @code{msgid} and @code{msgstr} entries after fuzzy 1543search only. @xref{Fuzzy Entries}. 1544 1545@item c-format 1546@kwindex c-format@r{ flag} 1547@itemx no-c-format 1548@kwindex no-c-format@r{ flag} 1549These flags should not be added by a human. Instead only the 1550@code{xgettext} program adds them. In an automated PO file processing 1551system as proposed here, the user's changes would be thrown away again as 1552soon as the @code{xgettext} program generates a new template file. 1553 1554The @code{c-format} flag indicates that the untranslated string and the 1555translation are supposed to be C format strings. The @code{no-c-format} 1556flag indicates that they are not C format strings, even though the untranslated 1557string happens to look like a C format string (with @samp{%} directives). 1558 1559When the @code{c-format} flag is given for a string the @code{msgfmt} 1560program does some more tests to check the validity of the translation. 1561@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}. 1562 1563@item objc-format 1564@kwindex objc-format@r{ flag} 1565@itemx no-objc-format 1566@kwindex no-objc-format@r{ flag} 1567Likewise for Objective C, see @ref{objc-format}. 1568 1569@item python-format 1570@kwindex python-format@r{ flag} 1571@itemx no-python-format 1572@kwindex no-python-format@r{ flag} 1573Likewise for Python, see @ref{python-format}. 1574 1575@item python-brace-format 1576@kwindex python-brace-format@r{ flag} 1577@itemx no-python-brace-format 1578@kwindex no-python-brace-format@r{ flag} 1579Likewise for Python brace, see @ref{python-format}. 1580 1581@item java-format 1582@kwindex java-format@r{ flag} 1583@itemx no-java-format 1584@kwindex no-java-format@r{ flag} 1585Likewise for Java @code{MessageFormat} format strings, see @ref{java-format}. 1586 1587@item java-printf-format 1588@kwindex java-printf-format@r{ flag} 1589@itemx no-java-printf-format 1590@kwindex no-java-printf-format@r{ flag} 1591Likewise for Java @code{printf} format strings, see @ref{java-format}. 1592 1593@item csharp-format 1594@kwindex csharp-format@r{ flag} 1595@itemx no-csharp-format 1596@kwindex no-csharp-format@r{ flag} 1597Likewise for C#, see @ref{csharp-format}. 1598 1599@item javascript-format 1600@kwindex javascript-format@r{ flag} 1601@itemx no-javascript-format 1602@kwindex no-javascript-format@r{ flag} 1603Likewise for JavaScript, see @ref{javascript-format}. 1604 1605@item scheme-format 1606@kwindex scheme-format@r{ flag} 1607@itemx no-scheme-format 1608@kwindex no-scheme-format@r{ flag} 1609Likewise for Scheme, see @ref{scheme-format}. 1610 1611@item lisp-format 1612@kwindex lisp-format@r{ flag} 1613@itemx no-lisp-format 1614@kwindex no-lisp-format@r{ flag} 1615Likewise for Lisp, see @ref{lisp-format}. 1616 1617@item elisp-format 1618@kwindex elisp-format@r{ flag} 1619@itemx no-elisp-format 1620@kwindex no-elisp-format@r{ flag} 1621Likewise for Emacs Lisp, see @ref{elisp-format}. 1622 1623@item librep-format 1624@kwindex librep-format@r{ flag} 1625@itemx no-librep-format 1626@kwindex no-librep-format@r{ flag} 1627Likewise for librep, see @ref{librep-format}. 1628 1629@item ruby-format 1630@kwindex ruby-format@r{ flag} 1631@itemx no-ruby-format 1632@kwindex no-ruby-format@r{ flag} 1633Likewise for Ruby, see @ref{ruby-format}. 1634 1635@item sh-format 1636@kwindex sh-format@r{ flag} 1637@itemx no-sh-format 1638@kwindex no-sh-format@r{ flag} 1639Likewise for Shell, see @ref{sh-format}. 1640 1641@item awk-format 1642@kwindex awk-format@r{ flag} 1643@itemx no-awk-format 1644@kwindex no-awk-format@r{ flag} 1645Likewise for awk, see @ref{awk-format}. 1646 1647@item lua-format 1648@kwindex lua-format@r{ flag} 1649@itemx no-lua-format 1650@kwindex no-lua-format@r{ flag} 1651Likewise for Lua, see @ref{lua-format}. 1652 1653@item object-pascal-format 1654@kwindex object-pascal-format@r{ flag} 1655@itemx no-object-pascal-format 1656@kwindex no-object-pascal-format@r{ flag} 1657Likewise for Object Pascal, see @ref{object-pascal-format}. 1658 1659@item smalltalk-format 1660@kwindex smalltalk-format@r{ flag} 1661@itemx no-smalltalk-format 1662@kwindex no-smalltalk-format@r{ flag} 1663Likewise for Smalltalk, see @ref{smalltalk-format}. 1664 1665@item qt-format 1666@kwindex qt-format@r{ flag} 1667@itemx no-qt-format 1668@kwindex no-qt-format@r{ flag} 1669Likewise for Qt, see @ref{qt-format}. 1670 1671@item qt-plural-format 1672@kwindex qt-plural-format@r{ flag} 1673@itemx no-qt-plural-format 1674@kwindex no-qt-plural-format@r{ flag} 1675Likewise for Qt plural forms, see @ref{qt-plural-format}. 1676 1677@item kde-format 1678@kwindex kde-format@r{ flag} 1679@itemx no-kde-format 1680@kwindex no-kde-format@r{ flag} 1681Likewise for KDE, see @ref{kde-format}. 1682 1683@item boost-format 1684@kwindex boost-format@r{ flag} 1685@itemx no-boost-format 1686@kwindex no-boost-format@r{ flag} 1687Likewise for Boost, see @ref{boost-format}. 1688 1689@item tcl-format 1690@kwindex tcl-format@r{ flag} 1691@itemx no-tcl-format 1692@kwindex no-tcl-format@r{ flag} 1693Likewise for Tcl, see @ref{tcl-format}. 1694 1695@item perl-format 1696@kwindex perl-format@r{ flag} 1697@itemx no-perl-format 1698@kwindex no-perl-format@r{ flag} 1699Likewise for Perl, see @ref{perl-format}. 1700 1701@item perl-brace-format 1702@kwindex perl-brace-format@r{ flag} 1703@itemx no-perl-brace-format 1704@kwindex no-perl-brace-format@r{ flag} 1705Likewise for Perl brace, see @ref{perl-format}. 1706 1707@item php-format 1708@kwindex php-format@r{ flag} 1709@itemx no-php-format 1710@kwindex no-php-format@r{ flag} 1711Likewise for PHP, see @ref{php-format}. 1712 1713@item gcc-internal-format 1714@kwindex gcc-internal-format@r{ flag} 1715@itemx no-gcc-internal-format 1716@kwindex no-gcc-internal-format@r{ flag} 1717Likewise for the GCC sources, see @ref{gcc-internal-format}. 1718 1719@item gfc-internal-format 1720@kwindex gfc-internal-format@r{ flag} 1721@itemx no-gfc-internal-format 1722@kwindex no-gfc-internal-format@r{ flag} 1723Likewise for the GNU Fortran Compiler sources, see @ref{gfc-internal-format}. 1724 1725@item ycp-format 1726@kwindex ycp-format@r{ flag} 1727@itemx no-ycp-format 1728@kwindex no-ycp-format@r{ flag} 1729Likewise for YCP, see @ref{ycp-format}. 1730 1731@end table 1732 1733@kwindex msgctxt 1734@cindex context, in PO files 1735It is also possible to have entries with a context specifier. They look like 1736this: 1737 1738@example 1739@var{white-space} 1740# @var{translator-comments} 1741#. @var{extracted-comments} 1742#: @var{reference}@dots{} 1743#, @var{flag}@dots{} 1744#| msgctxt @var{previous-context} 1745#| msgid @var{previous-untranslated-string} 1746msgctxt @var{context} 1747msgid @var{untranslated-string} 1748msgstr @var{translated-string} 1749@end example 1750 1751The context serves to disambiguate messages with the same 1752@var{untranslated-string}. It is possible to have several entries with 1753the same @var{untranslated-string} in a PO file, provided that they each 1754have a different @var{context}. Note that an empty @var{context} string 1755and an absent @code{msgctxt} line do not mean the same thing. 1756 1757@kwindex msgid_plural 1758@cindex plural forms, in PO files 1759A different kind of entries is used for translations which involve 1760plural forms. 1761 1762@example 1763@var{white-space} 1764# @var{translator-comments} 1765#. @var{extracted-comments} 1766#: @var{reference}@dots{} 1767#, @var{flag}@dots{} 1768#| msgid @var{previous-untranslated-string-singular} 1769#| msgid_plural @var{previous-untranslated-string-plural} 1770msgid @var{untranslated-string-singular} 1771msgid_plural @var{untranslated-string-plural} 1772msgstr[0] @var{translated-string-case-0} 1773... 1774msgstr[N] @var{translated-string-case-n} 1775@end example 1776 1777Such an entry can look like this: 1778 1779@example 1780#: src/msgcmp.c:338 src/po-lex.c:699 1781#, c-format 1782msgid "found %d fatal error" 1783msgid_plural "found %d fatal errors" 1784msgstr[0] "s'ha trobat %d error fatal" 1785msgstr[1] "s'han trobat %d errors fatals" 1786@end example 1787 1788Here also, a @code{msgctxt} context can be specified before @code{msgid}, 1789like above. 1790 1791Here, additional kinds of flags can be used: 1792 1793@table @code 1794@item range: 1795@kwindex range:@r{ flag} 1796This flag is followed by a range of non-negative numbers, using the syntax 1797@code{range: @var{minimum-value}..@var{maximum-value}}. It designates the 1798possible values that the numeric parameter of the message can take. In some 1799languages, translators may produce slightly better translations if they know 1800that the value can only take on values between 0 and 10, for example. 1801@end table 1802 1803The @var{previous-untranslated-string} is optionally inserted by the 1804@code{msgmerge} program, at the same time when it marks a message fuzzy. 1805It helps the translator to see which changes were done by the developers 1806on the @var{untranslated-string}. 1807 1808It happens that some lines, usually whitespace or comments, follow the 1809very last entry of a PO file. Such lines are not part of any entry, 1810and will be dropped when the PO file is processed by the tools, or may 1811disturb some PO file editors. 1812 1813The remainder of this section may be safely skipped by those using 1814a PO file editor, yet it may be interesting for everybody to have a better 1815idea of the precise format of a PO file. On the other hand, those 1816wishing to modify PO files by hand should carefully continue reading on. 1817 1818An empty @var{untranslated-string} is reserved to contain the header 1819entry with the meta information (@pxref{Header Entry}). This header 1820entry should be the first entry of the file. The empty 1821@var{untranslated-string} is reserved for this purpose and must 1822not be used anywhere else. 1823 1824Each of @var{untranslated-string} and @var{translated-string} respects 1825the C syntax for a character string, including the surrounding quotes 1826and embedded backslashed escape sequences. When the time comes 1827to write multi-line strings, one should not use escaped newlines. 1828Instead, a closing quote should follow the last character on the 1829line to be continued, and an opening quote should resume the string 1830at the beginning of the following PO file line. For example: 1831 1832@example 1833msgid "" 1834"Here is an example of how one might continue a very long string\n" 1835"for the common case the string represents multi-line output.\n" 1836@end example 1837 1838@noindent 1839In this example, the empty string is used on the first line, to 1840allow better alignment of the @code{H} from the word @samp{Here} 1841over the @code{f} from the word @samp{for}. In this example, the 1842@code{msgid} keyword is followed by three strings, which are meant 1843to be concatenated. Concatenating the empty string does not change 1844the resulting overall string, but it is a way for us to comply with 1845the necessity of @code{msgid} to be followed by a string on the same 1846line, while keeping the multi-line presentation left-justified, as 1847we find this to be a cleaner disposition. The empty string could have 1848been omitted, but only if the string starting with @samp{Here} was 1849promoted on the first line, right after @code{msgid}.@footnote{This 1850limitation is not imposed by GNU @code{gettext}, but is for compatibility 1851with the @code{msgfmt} implementation on Solaris.} It was not really necessary 1852either to switch between the two last quoted strings immediately after 1853the newline @samp{\n}, the switch could have occurred after @emph{any} 1854other character, we just did it this way because it is neater. 1855 1856@cindex newlines in PO files 1857One should carefully distinguish between end of lines marked as 1858@samp{\n} @emph{inside} quotes, which are part of the represented 1859string, and end of lines in the PO file itself, outside string quotes, 1860which have no incidence on the represented string. 1861 1862@cindex comments in PO files 1863Outside strings, white lines and comments may be used freely. 1864Comments start at the beginning of a line with @samp{#} and extend 1865until the end of the PO file line. Comments written by translators 1866should have the initial @samp{#} immediately followed by some white 1867space. If the @samp{#} is not immediately followed by white space, 1868this comment is most likely generated and managed by specialized GNU 1869tools, and might disappear or be replaced unexpectedly when the PO 1870file is given to @code{msgmerge}. 1871 1872@node Sources 1873@chapter Preparing Program Sources 1874@cindex preparing programs for translation 1875 1876@c FIXME: Rewrite (the whole chapter). 1877 1878For the programmer, changes to the C source code fall into three 1879categories. First, you have to make the localization functions 1880known to all modules needing message translation. Second, you should 1881properly trigger the operation of GNU @code{gettext} when the program 1882initializes, usually from the @code{main} function. Last, you should 1883identify, adjust and mark all constant strings in your program 1884needing translation. 1885 1886@menu 1887* Importing:: Importing the @code{gettext} declaration 1888* Triggering:: Triggering @code{gettext} Operations 1889* Preparing Strings:: Preparing Translatable Strings 1890* Mark Keywords:: How Marks Appear in Sources 1891* Marking:: Marking Translatable Strings 1892* c-format Flag:: Telling something about the following string 1893* Special cases:: Special Cases of Translatable Strings 1894* Bug Report Address:: Letting Users Report Translation Bugs 1895* Names:: Marking Proper Names for Translation 1896* Libraries:: Preparing Library Sources 1897@end menu 1898 1899@node Importing 1900@section Importing the @code{gettext} declaration 1901 1902Presuming that your set of programs, or package, has been adjusted 1903so all needed GNU @code{gettext} files are available, and your 1904@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module 1905having translated C strings should contain the line: 1906 1907@cindex include file @file{libintl.h} 1908@example 1909#include <libintl.h> 1910@end example 1911 1912Similarly, each C module containing @code{printf()}/@code{fprintf()}/... 1913calls with a format string that could be a translated C string (even if 1914the C string comes from a different C module) should contain the line: 1915 1916@example 1917#include <libintl.h> 1918@end example 1919 1920@node Triggering 1921@section Triggering @code{gettext} Operations 1922 1923@cindex initialization 1924The initialization of locale data should be done with more or less 1925the same code in every program, as demonstrated below: 1926 1927@example 1928@group 1929int 1930main (int argc, char *argv[]) 1931@{ 1932 @dots{} 1933 setlocale (LC_ALL, ""); 1934 bindtextdomain (PACKAGE, LOCALEDIR); 1935 textdomain (PACKAGE); 1936 @dots{} 1937@} 1938@end group 1939@end example 1940 1941@var{PACKAGE} and @var{LOCALEDIR} should be provided either by 1942@file{config.h} or by the Makefile. For now consult the @code{gettext} 1943or @code{hello} sources for more information. 1944 1945@cindex locale category, LC_ALL 1946@cindex locale category, LC_CTYPE 1947The use of @code{LC_ALL} might not be appropriate for you. 1948@code{LC_ALL} includes all locale categories and especially 1949@code{LC_CTYPE}. This latter category is responsible for determining 1950character classes with the @code{isalnum} etc. functions from 1951@file{ctype.h} which could especially for programs, which process some 1952kind of input language, be wrong. For example this would mean that a 1953source code using the @,{c} (c-cedilla character) is runnable in 1954France but not in the U.S. 1955 1956Some systems also have problems with parsing numbers using the 1957@code{scanf} functions if an other but the @code{LC_ALL} locale category is 1958used. The standards say that additional formats but the one known in the 1959@code{"C"} locale might be recognized. But some systems seem to reject 1960numbers in the @code{"C"} locale format. In some situation, it might 1961also be a problem with the notation itself which makes it impossible to 1962recognize whether the number is in the @code{"C"} locale or the local 1963format. This can happen if thousands separator characters are used. 1964Some locales define this character according to the national 1965conventions to @code{'.'} which is the same character used in the 1966@code{"C"} locale to denote the decimal point. 1967 1968So it is sometimes necessary to replace the @code{LC_ALL} line in the 1969code above by a sequence of @code{setlocale} lines 1970 1971@example 1972@group 1973@{ 1974 @dots{} 1975 setlocale (LC_CTYPE, ""); 1976 setlocale (LC_MESSAGES, ""); 1977 @dots{} 1978@} 1979@end group 1980@end example 1981 1982@cindex locale category, LC_CTYPE 1983@cindex locale category, LC_COLLATE 1984@cindex locale category, LC_MONETARY 1985@cindex locale category, LC_NUMERIC 1986@cindex locale category, LC_TIME 1987@cindex locale category, LC_MESSAGES 1988@cindex locale category, LC_RESPONSES 1989@noindent 1990On all POSIX conformant systems the locale categories @code{LC_CTYPE}, 1991@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY}, 1992@code{LC_NUMERIC}, and @code{LC_TIME} are available. On some systems 1993which are only ISO C compliant, @code{LC_MESSAGES} is missing, but 1994a substitute for it is defined in GNU gettext's @code{<libintl.h>} and 1995in GNU gnulib's @code{<locale.h>}. 1996 1997Note that changing the @code{LC_CTYPE} also affects the functions 1998declared in the @code{<ctype.h>} standard header and some functions 1999declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers. 2000If this is not 2001desirable in your application (for example in a compiler's parser), 2002you can use a set of substitute functions which hardwire the C locale, 2003such as found in the modules @samp{c-ctype}, @samp{c-strcase}, 2004@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib 2005source distribution. 2006 2007It is also possible to switch the locale forth and back between the 2008environment dependent locale and the C locale, but this approach is 2009normally avoided because a @code{setlocale} call is expensive, 2010because it is tedious to determine the places where a locale switch 2011is needed in a large program's source, and because switching a locale 2012is not multithread-safe. 2013 2014@node Preparing Strings 2015@section Preparing Translatable Strings 2016 2017@cindex marking strings, preparations 2018Before strings can be marked for translations, they sometimes need to 2019be adjusted. Usually preparing a string for translation is done right 2020before marking it, during the marking phase which is described in the 2021next sections. What you have to keep in mind while doing that is the 2022following. 2023 2024@itemize @bullet 2025@item 2026Decent English style. 2027 2028@item 2029Entire sentences. 2030 2031@item 2032Split at paragraphs. 2033 2034@item 2035Use format strings instead of string concatenation. 2036 2037@item 2038Use placeholders in format strings instead of embedded URLs. 2039 2040@item 2041Avoid unusual markup and unusual control characters. 2042@end itemize 2043 2044@noindent 2045Let's look at some examples of these guidelines. 2046 2047@subheading Decent English style 2048 2049@cindex style 2050Translatable strings should be in good English style. If slang language 2051with abbreviations and shortcuts is used, often translators will not 2052understand the message and will produce very inappropriate translations. 2053 2054@example 2055"%s: is parameter\n" 2056@end example 2057 2058@noindent 2059This is nearly untranslatable: Is the displayed item @emph{a} parameter or 2060@emph{the} parameter? 2061 2062@example 2063"No match" 2064@end example 2065 2066@noindent 2067The ambiguity in this message makes it unintelligible: Is the program 2068attempting to set something on fire? Does it mean "The given object does 2069not match the template"? Does it mean "The template does not fit for any 2070of the objects"? 2071 2072@cindex ambiguities 2073In both cases, adding more words to the message will help both the 2074translator and the English speaking user. 2075 2076@subheading Entire sentences 2077 2078@cindex sentences 2079Translatable strings should be entire sentences. It is often not possible 2080to translate single verbs or adjectives in a substitutable way. 2081 2082@example 2083printf ("File %s is %s protected", filename, rw ? "write" : "read"); 2084@end example 2085 2086@noindent 2087Most translators will not look at the source and will thus only see the 2088string @code{"File %s is %s protected"}, which is unintelligible. Change 2089this to 2090 2091@example 2092printf (rw ? "File %s is write protected" : "File %s is read protected", 2093 filename); 2094@end example 2095 2096@noindent 2097This way the translator will not only understand the message, she will 2098also be able to find the appropriate grammatical construction. A French 2099translator for example translates "write protected" like "protected 2100against writing". 2101 2102Entire sentences are also important because in many languages, the 2103declination of some word in a sentence depends on the gender or the 2104number (singular/plural) of another part of the sentence. There are 2105usually more interdependencies between words than in English. The 2106consequence is that asking a translator to translate two half-sentences 2107and then combining these two half-sentences through dumb string concatenation 2108will not work, for many languages, even though it would work for English. 2109That's why translators need to handle entire sentences. 2110 2111Often sentences don't fit into a single line. If a sentence is output 2112using two subsequent @code{printf} statements, like this 2113 2114@example 2115printf ("Locale charset \"%s\" is different from\n", lcharset); 2116printf ("input file charset \"%s\".\n", fcharset); 2117@end example 2118 2119@noindent 2120the translator would have to translate two half sentences, but nothing 2121in the POT file would tell her that the two half sentences belong together. 2122It is necessary to merge the two @code{printf} statements so that the 2123translator can handle the entire sentence at once and decide at which 2124place to insert a line break in the translation (if at all): 2125 2126@example 2127printf ("Locale charset \"%s\" is different from\n\ 2128input file charset \"%s\".\n", lcharset, fcharset); 2129@end example 2130 2131You may now ask: how about two or more adjacent sentences? Like in this case: 2132 2133@example 2134puts ("Apollo 13 scenario: Stack overflow handling failed."); 2135puts ("On the next stack overflow we will crash!!!"); 2136@end example 2137 2138@noindent 2139Should these two statements merged into a single one? I would recommend to 2140merge them if the two sentences are related to each other, because then it 2141makes it easier for the translator to understand and translate both. On 2142the other hand, if one of the two messages is a stereotypic one, occurring 2143in other places as well, you will do a favour to the translator by not 2144merging the two. (Identical messages occurring in several places are 2145combined by xgettext, so the translator has to handle them once only.) 2146 2147@subheading Split at paragraphs 2148 2149@cindex paragraphs 2150Translatable strings should be limited to one paragraph; don't let a 2151single message be longer than ten lines. The reason is that when the 2152translatable string changes, the translator is faced with the task of 2153updating the entire translated string. Maybe only a single word will 2154have changed in the English string, but the translator doesn't see that 2155(with the current translation tools), therefore she has to proofread 2156the entire message. 2157 2158@cindex help option 2159Many GNU programs have a @samp{--help} output that extends over several 2160screen pages. It is a courtesy towards the translators to split such a 2161message into several ones of five to ten lines each. While doing that, 2162you can also attempt to split the documented options into groups, 2163such as the input options, the output options, and the informative 2164output options. This will help every user to find the option he is 2165looking for. 2166 2167@subheading No string concatenation 2168 2169@cindex string concatenation 2170@cindex concatenation of strings 2171Hardcoded string concatenation is sometimes used to construct English 2172strings: 2173 2174@example 2175strcpy (s, "Replace "); 2176strcat (s, object1); 2177strcat (s, " with "); 2178strcat (s, object2); 2179strcat (s, "?"); 2180@end example 2181 2182@noindent 2183In order to present to the translator only entire sentences, and also 2184because in some languages the translator might want to swap the order 2185of @code{object1} and @code{object2}, it is necessary to change this 2186to use a format string: 2187 2188@example 2189sprintf (s, "Replace %s with %s?", object1, object2); 2190@end example 2191 2192@cindex @code{inttypes.h} 2193A similar case is compile time concatenation of strings. The ISO C 99 2194include file @code{<inttypes.h>} contains a macro @code{PRId64} that 2195can be used as a formatting directive for outputting an @samp{int64_t} 2196integer through @code{printf}. It expands to a constant string, usually 2197"d" or "ld" or "lld" or something like this, depending on the platform. 2198Assume you have code like 2199 2200@example 2201printf ("The amount is %0" PRId64 "\n", number); 2202@end example 2203 2204@noindent 2205The @code{gettext} tools and library have special support for these 2206@code{<inttypes.h>} macros. You can therefore simply write 2207 2208@example 2209printf (gettext ("The amount is %0" PRId64 "\n"), number); 2210@end example 2211 2212@noindent 2213The PO file will contain the string "The amount is %0<PRId64>\n". 2214The translators will provide a translation containing "%0<PRId64>" 2215as well, and at runtime the @code{gettext} function's result will 2216contain the appropriate constant string, "d" or "ld" or "lld". 2217 2218This works only for the predefined @code{<inttypes.h>} macros. If 2219you have defined your own similar macros, let's say @samp{MYPRId64}, 2220that are not known to @code{xgettext}, the solution for this problem 2221is to change the code like this: 2222 2223@example 2224char buf1[100]; 2225sprintf (buf1, "%0" MYPRId64, number); 2226printf (gettext ("The amount is %s\n"), buf1); 2227@end example 2228 2229This means, you put the platform dependent code in one statement, and the 2230internationalization code in a different statement. Note that a buffer length 2231of 100 is safe, because all available hardware integer types are limited to 2232128 bits, and to print a 128 bit integer one needs at most 54 characters, 2233regardless whether in decimal, octal or hexadecimal. 2234 2235@cindex Java, string concatenation 2236@cindex C#, string concatenation 2237All this applies to other programming languages as well. For example, in 2238Java and C#, string concatenation is very frequently used, because it is a 2239compiler built-in operator. Like in C, in Java, you would change 2240 2241@example 2242System.out.println("Replace "+object1+" with "+object2+"?"); 2243@end example 2244 2245@noindent 2246into a statement involving a format string: 2247 2248@example 2249System.out.println( 2250 MessageFormat.format("Replace @{0@} with @{1@}?", 2251 new Object[] @{ object1, object2 @})); 2252@end example 2253 2254@noindent 2255Similarly, in C#, you would change 2256 2257@example 2258Console.WriteLine("Replace "+object1+" with "+object2+"?"); 2259@end example 2260 2261@noindent 2262into a statement involving a format string: 2263 2264@example 2265Console.WriteLine( 2266 String.Format("Replace @{0@} with @{1@}?", object1, object2)); 2267@end example 2268 2269@subheading No embedded URLs 2270 2271It is good to not embed URLs in translatable strings, for several reasons: 2272@itemize @bullet 2273@item 2274It avoids possible mistakes during copy and paste. 2275@item 2276Translators cannot translate the URLs or, by mistake, use the URLs from 2277other packages that are present in their compendium. 2278@item 2279When the URLs change, translators don't need to revisit the translation 2280of the string. 2281@end itemize 2282 2283The same holds for email addresses. 2284 2285So, you would change 2286 2287@example 2288fputs (_("GNU GPL version 3 <https://gnu.org/licenses/gpl.html>\n"), 2289 stream); 2290@end example 2291 2292@noindent 2293to 2294 2295@example 2296fprintf (stream, _("GNU GPL version 3 <%s>\n"), 2297 "https://gnu.org/licenses/gpl.html"); 2298@end example 2299 2300@subheading No unusual markup 2301 2302@cindex markup 2303@cindex control characters 2304Unusual markup or control characters should not be used in translatable 2305strings. Translators will likely not understand the particular meaning 2306of the markup or control characters. 2307 2308For example, if you have a convention that @samp{|} delimits the 2309left-hand and right-hand part of some GUI elements, translators will 2310often not understand it without specific comments. It might be 2311better to have the translator translate the left-hand and right-hand 2312part separately. 2313 2314Another example is the @samp{argp} convention to use a single @samp{\v} 2315(vertical tab) control character to delimit two sections inside a 2316string. This is flawed. Some translators may convert it to a simple 2317newline, some to blank lines. With some PO file editors it may not be 2318easy to even enter a vertical tab control character. So, you cannot 2319be sure that the translation will contain a @samp{\v} character, at the 2320corresponding position. The solution is, again, to let the translator 2321translate two separate strings and combine at run-time the two translated 2322strings with the @samp{\v} required by the convention. 2323 2324HTML markup, however, is common enough that it's probably ok to use in 2325translatable strings. But please bear in mind that the GNU gettext tools 2326don't verify that the translations are well-formed HTML. 2327 2328@node Mark Keywords 2329@section How Marks Appear in Sources 2330@cindex marking strings that require translation 2331 2332All strings requiring translation should be marked in the C sources. Marking 2333is done in such a way that each translatable string appears to be 2334the sole argument of some function or preprocessor macro. There are 2335only a few such possible functions or macros meant for translation, 2336and their names are said to be marking keywords. The marking is 2337attached to strings themselves, rather than to what we do with them. 2338This approach has more uses. A blatant example is an error message 2339produced by formatting. The format string needs translation, as 2340well as some strings inserted through some @samp{%s} specification 2341in the format, while the result from @code{sprintf} may have so many 2342different instances that it is impractical to list them all in some 2343@samp{error_string_out()} routine, say. 2344 2345This marking operation has two goals. The first goal of marking 2346is for triggering the retrieval of the translation, at run time. 2347The keyword is possibly resolved into a routine able to dynamically 2348return the proper translation, as far as possible or wanted, for the 2349argument string. Most localizable strings are found in executable 2350positions, that is, attached to variables or given as parameters to 2351functions. But this is not universal usage, and some translatable 2352strings appear in structured initializations. @xref{Special cases}. 2353 2354The second goal of the marking operation is to help @code{xgettext} 2355at properly extracting all translatable strings when it scans a set 2356of program sources and produces PO file templates. 2357 2358The canonical keyword for marking translatable strings is 2359@samp{gettext}, it gave its name to the whole GNU @code{gettext} 2360package. For packages making only light use of the @samp{gettext} 2361keyword, macro or function, it is easily used @emph{as is}. However, 2362for packages using the @code{gettext} interface more heavily, it 2363is usually more convenient to give the main keyword a shorter, less 2364obtrusive name. Indeed, the keyword might appear on a lot of strings 2365all over the package, and programmers usually do not want nor need 2366their program sources to remind them forcefully, all the time, that they 2367are internationalized. Further, a long keyword has the disadvantage 2368of using more horizontal space, forcing more indentation work on 2369sources for those trying to keep them within 79 or 80 columns. 2370 2371@cindex @code{_}, a macro to mark strings for translation 2372Many packages use @samp{_} (a simple underline) as a keyword, 2373and write @samp{_("Translatable string")} instead of @samp{gettext 2374("Translatable string")}. Further, the coding rule, from GNU standards, 2375wanting that there is a space between the keyword and the opening 2376parenthesis is relaxed, in practice, for this particular usage. 2377So, the textual overhead per translatable string is reduced to 2378only three characters: the underline and the two parentheses. 2379However, even if GNU @code{gettext} uses this convention internally, 2380it does not offer it officially. The real, genuine keyword is truly 2381@samp{gettext} indeed. It is fairly easy for those wanting to use 2382@samp{_} instead of @samp{gettext} to declare: 2383 2384@example 2385#include <libintl.h> 2386#define _(String) gettext (String) 2387@end example 2388 2389@noindent 2390instead of merely using @samp{#include <libintl.h>}. 2391 2392The marking keywords @samp{gettext} and @samp{_} take the translatable 2393string as sole argument. It is also possible to define marking functions 2394that take it at another argument position. It is even possible to make 2395the marked argument position depend on the total number of arguments of 2396the function call; this is useful in C++. All this is achieved using 2397@code{xgettext}'s @samp{--keyword} option. How to pass such an option 2398to @code{xgettext}, assuming that @code{gettextize} is used, is described 2399in @ref{po/Makevars} and @ref{AM_XGETTEXT_OPTION}. 2400 2401Note also that long strings can be split across lines, into multiple 2402adjacent string tokens. Automatic string concatenation is performed 2403at compile time according to ISO C and ISO C++; @code{xgettext} also 2404supports this syntax. 2405 2406Later on, the maintenance is relatively easy. If, as a programmer, 2407you add or modify a string, you will have to ask yourself if the 2408new or altered string requires translation, and include it within 2409@samp{_()} if you think it should be translated. For example, @samp{"%s"} 2410is an example of string @emph{not} requiring translation. But 2411@samp{"%s: %d"} @emph{does} require translation, because in French, unlike 2412in English, it's customary to put a space before a colon. 2413 2414@node Marking 2415@section Marking Translatable Strings 2416@emindex marking strings for translation 2417 2418In PO mode, one set of features is meant more for the programmer than 2419for the translator, and allows him to interactively mark which strings, 2420in a set of program sources, are translatable, and which are not. 2421Even if it is a fairly easy job for a programmer to find and mark 2422such strings by other means, using any editor of his choice, PO mode 2423makes this work more comfortable. Further, this gives translators 2424who feel a little like programmers, or programmers who feel a little 2425like translators, a tool letting them work at marking translatable 2426strings in the program sources, while simultaneously producing a set of 2427translation in some language, for the package being internationalized. 2428 2429@emindex @code{etags}, using for marking strings 2430The set of program sources, targeted by the PO mode commands describe 2431here, should have an Emacs tags table constructed for your project, 2432prior to using these PO file commands. This is easy to do. In any 2433shell window, change the directory to the root of your project, then 2434execute a command resembling: 2435 2436@example 2437etags src/*.[hc] lib/*.[hc] 2438@end example 2439 2440@noindent 2441presuming here you want to process all @file{.h} and @file{.c} files 2442from the @file{src/} and @file{lib/} directories. This command will 2443explore all said files and create a @file{TAGS} file in your root 2444directory, somewhat summarizing the contents using a special file 2445format Emacs can understand. 2446 2447@emindex @file{TAGS}, and marking translatable strings 2448For packages following the GNU coding standards, there is 2449a make goal @code{tags} or @code{TAGS} which constructs the tag files in 2450all directories and for all files containing source code. 2451 2452Once your @file{TAGS} file is ready, the following commands assist 2453the programmer at marking translatable strings in his set of sources. 2454But these commands are necessarily driven from within a PO file 2455window, and it is likely that you do not even have such a PO file yet. 2456This is not a problem at all, as you may safely open a new, empty PO 2457file, mainly for using these commands. This empty PO file will slowly 2458fill in while you mark strings as translatable in your program sources. 2459 2460@table @kbd 2461@item , 2462@efindex ,@r{, PO Mode command} 2463Search through program sources for a string which looks like a 2464candidate for translation (@code{po-tags-search}). 2465 2466@item M-, 2467@efindex M-,@r{, PO Mode command} 2468Mark the last string found with @samp{_()} (@code{po-mark-translatable}). 2469 2470@item M-. 2471@efindex M-.@r{, PO Mode command} 2472Mark the last string found with a keyword taken from a set of possible 2473keywords. This command with a prefix allows some management of these 2474keywords (@code{po-select-mark-and-mark}). 2475 2476@end table 2477 2478@efindex po-tags-search@r{, PO Mode command} 2479The @kbd{,} (@code{po-tags-search}) command searches for the next 2480occurrence of a string which looks like a possible candidate for 2481translation, and displays the program source in another Emacs window, 2482positioned in such a way that the string is near the top of this other 2483window. If the string is too big to fit whole in this window, it is 2484positioned so only its end is shown. In any case, the cursor 2485is left in the PO file window. If the shown string would be better 2486presented differently in different native languages, you may mark it 2487using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it 2488and skip to the next string by merely repeating the @kbd{,} command. 2489 2490A string is a good candidate for translation if it contains a sequence 2491of three or more letters. A string containing at most two letters in 2492a row will be considered as a candidate if it has more letters than 2493non-letters. The command disregards strings containing no letters, 2494or isolated letters only. It also disregards strings within comments, 2495or strings already marked with some keyword PO mode knows (see below). 2496 2497If you have never told Emacs about some @file{TAGS} file to use, the 2498command will request that you specify one from the minibuffer, the 2499first time you use the command. You may later change your @file{TAGS} 2500file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}}, 2501which will ask you to name the precise @file{TAGS} file you want 2502to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}. 2503 2504Each time you use the @kbd{,} command, the search resumes from where it was 2505left by the previous search, and goes through all program sources, 2506obeying the @file{TAGS} file, until all sources have been processed. 2507However, by giving a prefix argument to the command @w{(@kbd{C-u 2508,})}, you may request that the search be restarted all over again 2509from the first program source; but in this case, strings that you 2510recently marked as translatable will be automatically skipped. 2511 2512Using this @kbd{,} command does not prevent using of other regular 2513Emacs tags commands. For example, regular @code{tags-search} or 2514@code{tags-query-replace} commands may be used without disrupting the 2515independent @kbd{,} search sequence. However, as implemented, the 2516@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a 2517prefix) might also reinitialize the regular Emacs tags searching to the 2518first tags file, this reinitialization might be considered spurious. 2519 2520@efindex po-mark-translatable@r{, PO Mode command} 2521@efindex po-select-mark-and-mark@r{, PO Mode command} 2522The @kbd{M-,} (@code{po-mark-translatable}) command will mark the 2523recently found string with the @samp{_} keyword. The @kbd{M-.} 2524(@code{po-select-mark-and-mark}) command will request that you type 2525one keyword from the minibuffer and use that keyword for marking 2526the string. Both commands will automatically create a new PO file 2527untranslated entry for the string being marked, and make it the 2528current entry (making it easy for you to immediately proceed to its 2529translation, if you feel like doing it right away). It is possible 2530that the modifications made to the program source by @kbd{M-,} or 2531@kbd{M-.} render some source line longer than 80 columns, forcing you 2532to break and re-indent this line differently. You may use the @kbd{O} 2533command from PO mode, or any other window changing command from 2534Emacs, to break out into the program source window, and do any 2535needed adjustments. You will have to use some regular Emacs command 2536to return the cursor to the PO file window, if you want command 2537@kbd{,} for the next string, say. 2538 2539The @kbd{M-.} command has a few built-in speedups, so you do not 2540have to explicitly type all keywords all the time. The first such 2541speedup is that you are presented with a @emph{preferred} keyword, 2542which you may accept by merely typing @kbd{@key{RET}} at the prompt. 2543The second speedup is that you may type any non-ambiguous prefix of the 2544keyword you really mean, and the command will complete it automatically 2545for you. This also means that PO mode has to @emph{know} all 2546your possible keywords, and that it will not accept mistyped keywords. 2547 2548If you reply @kbd{?} to the keyword request, the command gives a 2549list of all known keywords, from which you may choose. When the 2550command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits 2551updating any program source or PO file buffer, and does some simple 2552keyword management instead. In this case, the command asks for a 2553keyword, written in full, which becomes a new allowed keyword for 2554later @kbd{M-.} commands. Moreover, this new keyword automatically 2555becomes the @emph{preferred} keyword for later commands. By typing 2556an already known keyword in response to @w{@kbd{C-u M-.}}, one merely 2557changes the @emph{preferred} keyword and does nothing more. 2558 2559All keywords known for @kbd{M-.} are recognized by the @kbd{,} command 2560when scanning for strings, and strings already marked by any of those 2561known keywords are automatically skipped. If many PO files are opened 2562simultaneously, each one has its own independent set of known keywords. 2563There is no provision in PO mode, currently, for deleting a known 2564keyword, you have to quit the file (maybe using @kbd{q}) and reopen 2565it afresh. When a PO file is newly brought up in an Emacs window, only 2566@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext} 2567is preferred for the @kbd{M-.} command. In fact, this is not useful to 2568prefer @samp{_}, as this one is already built in the @kbd{M-,} command. 2569 2570@node c-format Flag 2571@section Special Comments preceding Keywords 2572 2573@c FIXME document c-format and no-c-format. 2574 2575@cindex format strings 2576In C programs strings are often used within calls of functions from the 2577@code{printf} family. The special thing about these format strings is 2578that they can contain format specifiers introduced with @kbd{%}. Assume 2579we have the code 2580 2581@example 2582printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); 2583@end example 2584 2585@noindent 2586A possible German translation for the above string might be: 2587 2588@example 2589"%d Zeichen lang ist die Zeichenkette `%s'" 2590@end example 2591 2592A C programmer, even if he cannot speak German, will recognize that 2593there is something wrong here. The order of the two format specifiers 2594is changed but of course the arguments in the @code{printf} don't have. 2595This will most probably lead to problems because now the length of the 2596string is regarded as the address. 2597 2598To prevent errors at runtime caused by translations, the @code{msgfmt} 2599tool can check statically whether the arguments in the original and the 2600translation string match in type and number. If this is not the case 2601and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt} 2602will give an error and refuse to produce a MO file. Thus consistent 2603use of @samp{msgfmt -c} will catch the error, so that it cannot cause 2604problems at runtime. 2605 2606@noindent 2607If the word order in the above German translation would be correct one 2608would have to write 2609 2610@example 2611"%2$d Zeichen lang ist die Zeichenkette `%1$s'" 2612@end example 2613 2614@noindent 2615The routines in @code{msgfmt} know about this special notation. 2616 2617Because not all strings in a program will be format strings, it is not 2618useful for @code{msgfmt} to test all the strings in the @file{.po} file. 2619This might cause problems because the string might contain what looks 2620like a format specifier, but the string is not used in @code{printf}. 2621 2622Therefore @code{xgettext} adds a special tag to those messages it 2623thinks might be a format string. There is no absolute rule for this, 2624only a heuristic. In the @file{.po} file the entry is marked using the 2625@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}). 2626 2627@kwindex c-format@r{, and @code{xgettext}} 2628@kwindex no-c-format@r{, and @code{xgettext}} 2629The careful reader now might say that this again can cause problems. 2630The heuristic might guess it wrong. This is true and therefore 2631@code{xgettext} knows about a special kind of comment which lets 2632the programmer take over the decision. If in the same line as or 2633the immediately preceding line to the @code{gettext} keyword 2634the @code{xgettext} program finds a comment containing the words 2635@code{xgettext:c-format}, it will mark the string in any case with 2636the @code{c-format} flag. This kind of comment should be used when 2637@code{xgettext} does not recognize the string as a format string but 2638it really is one and it should be tested. Please note that when the 2639comment is in the same line as the @code{gettext} keyword, it must be 2640before the string to be translated. 2641 2642This situation happens quite often. The @code{printf} function is often 2643called with strings which do not contain a format specifier. Of course 2644one would normally use @code{fputs} but it does happen. In this case 2645@code{xgettext} does not recognize this as a format string but what 2646happens if the translation introduces a valid format specifier? The 2647@code{printf} function will try to access one of the parameters but none 2648exists because the original code does not pass any parameters. 2649 2650@code{xgettext} of course could make a wrong decision the other way 2651round, i.e.@: a string marked as a format string actually is not a format 2652string. In this case the @code{msgfmt} might give too many warnings and 2653would prevent translating the @file{.po} file. The method to prevent 2654this wrong decision is similar to the one used above, only the comment 2655to use must contain the string @code{xgettext:no-c-format}. 2656 2657If a string is marked with @code{c-format} and this is not correct the 2658user can find out who is responsible for the decision. See 2659@ref{xgettext Invocation} to see how the @code{--debug} option can be 2660used for solving this problem. 2661 2662@node Special cases 2663@section Special Cases of Translatable Strings 2664 2665@cindex marking string initializers 2666The attentive reader might now point out that it is not always possible 2667to mark translatable string with @code{gettext} or something like this. 2668Consider the following case: 2669 2670@example 2671@group 2672@{ 2673 static const char *messages[] = @{ 2674 "some very meaningful message", 2675 "and another one" 2676 @}; 2677 const char *string; 2678 @dots{} 2679 string 2680 = index > 1 ? "a default message" : messages[index]; 2681 2682 fputs (string); 2683 @dots{} 2684@} 2685@end group 2686@end example 2687 2688While it is no problem to mark the string @code{"a default message"} it 2689is not possible to mark the string initializers for @code{messages}. 2690What is to be done? We have to fulfill two tasks. First we have to mark the 2691strings so that the @code{xgettext} program (@pxref{xgettext Invocation}) 2692can find them, and second we have to translate the string at runtime 2693before printing them. 2694 2695The first task can be fulfilled by creating a new keyword, which names a 2696no-op. For the second we have to mark all access points to a string 2697from the array. So one solution can look like this: 2698 2699@example 2700@group 2701#define gettext_noop(String) String 2702 2703@{ 2704 static const char *messages[] = @{ 2705 gettext_noop ("some very meaningful message"), 2706 gettext_noop ("and another one") 2707 @}; 2708 const char *string; 2709 @dots{} 2710 string 2711 = index > 1 ? gettext ("a default message") : gettext (messages[index]); 2712 2713 fputs (string); 2714 @dots{} 2715@} 2716@end group 2717@end example 2718 2719Please convince yourself that the string which is written by 2720@code{fputs} is translated in any case. How to get @code{xgettext} know 2721the additional keyword @code{gettext_noop} is explained in @ref{xgettext 2722Invocation}. 2723 2724The above is of course not the only solution. You could also come along 2725with the following one: 2726 2727@example 2728@group 2729#define gettext_noop(String) String 2730 2731@{ 2732 static const char *messages[] = @{ 2733 gettext_noop ("some very meaningful message"), 2734 gettext_noop ("and another one") 2735 @}; 2736 const char *string; 2737 @dots{} 2738 string 2739 = index > 1 ? gettext_noop ("a default message") : messages[index]; 2740 2741 fputs (gettext (string)); 2742 @dots{} 2743@} 2744@end group 2745@end example 2746 2747But this has a drawback. The programmer has to take care that 2748he uses @code{gettext_noop} for the string @code{"a default message"}. 2749A use of @code{gettext} could have in rare cases unpredictable results. 2750 2751One advantage is that you need not make control flow analysis to make 2752sure the output is really translated in any case. But this analysis is 2753generally not very difficult. If it should be in any situation you can 2754use this second method in this situation. 2755 2756@node Bug Report Address 2757@section Letting Users Report Translation Bugs 2758 2759Code sometimes has bugs, but translations sometimes have bugs too. The 2760users need to be able to report them. Reporting translation bugs to the 2761programmer or maintainer of a package is not very useful, since the 2762maintainer must never change a translation, except on behalf of the 2763translator. Hence the translation bugs must be reported to the 2764translators. 2765 2766Here is a way to organize this so that the maintainer does not need to 2767forward translation bug reports, nor even keep a list of the addresses of 2768the translators or their translation teams. 2769 2770Every program has a place where is shows the bug report address. For 2771GNU programs, it is the code which handles the ``--help'' option, 2772typically in a function called ``usage''. In this place, instruct the 2773translator to add her own bug reporting address. For example, if that 2774code has a statement 2775 2776@example 2777@group 2778printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 2779@end group 2780@end example 2781 2782you can add some translator instructions like this: 2783 2784@example 2785@group 2786/* TRANSLATORS: The placeholder indicates the bug-reporting address 2787 for this package. Please add _another line_ saying 2788 "Report translation bugs to <...>\n" with the address for translation 2789 bugs (typically your translation team's web or email address). */ 2790printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 2791@end group 2792@end example 2793 2794These will be extracted by @samp{xgettext}, leading to a .pot file that 2795contains this: 2796 2797@example 2798@group 2799#. TRANSLATORS: The placeholder indicates the bug-reporting address 2800#. for this package. Please add _another line_ saying 2801#. "Report translation bugs to <...>\n" with the address for translation 2802#. bugs (typically your translation team's web or email address). 2803#: src/hello.c:178 2804#, c-format 2805msgid "Report bugs to <%s>.\n" 2806msgstr "" 2807@end group 2808@end example 2809 2810@node Names 2811@section Marking Proper Names for Translation 2812 2813Should names of persons, cities, locations etc. be marked for translation 2814or not? People who only know languages that can be written with Latin 2815letters (English, Spanish, French, German, etc.) are tempted to say ``no'', 2816because names usually do not change when transported between these languages. 2817However, in general when translating from one script to another, names 2818are translated too, usually phonetically or by transliteration. For 2819example, Russian or Greek names are converted to the Latin alphabet when 2820being translated to English, and English or French names are converted 2821to the Katakana script when being translated to Japanese. This is 2822necessary because the speakers of the target language in general cannot 2823read the script the name is originally written in. 2824 2825As a programmer, you should therefore make sure that names are marked 2826for translation, with a special comment telling the translators that it 2827is a proper name and how to pronounce it. In its simple form, it looks 2828like this: 2829 2830@example 2831@group 2832printf (_("Written by %s.\n"), 2833 /* TRANSLATORS: This is a proper name. See the gettext 2834 manual, section Names. Note this is actually a non-ASCII 2835 name: The first name is (with Unicode escapes) 2836 "Fran\u00e7ois" or (with HTML entities) "François". 2837 Pronunciation is like "fraa-swa pee-nar". */ 2838 _("Francois Pinard")); 2839@end group 2840@end example 2841 2842@noindent 2843The GNU gnulib library offers a module @samp{propername} 2844(@url{https://www.gnu.org/software/gnulib/MODULES.html#module=propername}) 2845which takes care to automatically append the original name, in parentheses, 2846to the translated name. For names that cannot be written in ASCII, it 2847also frees the translator from the task of entering the appropriate non-ASCII 2848characters if no script change is needed. In this more comfortable form, 2849it looks like this: 2850 2851@example 2852@group 2853printf (_("Written by %s and %s.\n"), 2854 proper_name ("Ulrich Drepper"), 2855 /* TRANSLATORS: This is a proper name. See the gettext 2856 manual, section Names. Note this is actually a non-ASCII 2857 name: The first name is (with Unicode escapes) 2858 "Fran\u00e7ois" or (with HTML entities) "François". 2859 Pronunciation is like "fraa-swa pee-nar". */ 2860 proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard")); 2861@end group 2862@end example 2863 2864@noindent 2865You can also write the original name directly in Unicode (rather than with 2866Unicode escapes or HTML entities) and denote the pronunciation using the 2867International Phonetic Alphabet (see 2868@url{https://en.wikipedia.org/wiki/International_Phonetic_Alphabet}). 2869 2870As a translator, you should use some care when translating names, because 2871it is frustrating if people see their names mutilated or distorted. 2872 2873If your language uses the Latin script, all you need to do is to reproduce 2874the name as perfectly as you can within the usual character set of your 2875language. In this particular case, this means to provide a translation 2876containing the c-cedilla character. If your language uses a different 2877script and the people speaking it don't usually read Latin words, it means 2878transliteration. If the programmer used the simple case, you should still 2879give, in parentheses, the original writing of the name -- for the sake of 2880the people that do read the Latin script. If the programmer used the 2881@samp{propername} module mentioned above, you don't need to give the original 2882writing of the name in parentheses, because the program will already do so. 2883Here is an example, using Greek as the target script: 2884 2885@example 2886@group 2887#. This is a proper name. See the gettext 2888#. manual, section Names. Note this is actually a non-ASCII 2889#. name: The first name is (with Unicode escapes) 2890#. "Fran\u00e7ois" or (with HTML entities) "François". 2891#. Pronunciation is like "fraa-swa pee-nar". 2892msgid "Francois Pinard" 2893msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" 2894 " (Francois Pinard)" 2895@end group 2896@end example 2897 2898Because translation of names is such a sensitive domain, it is a good 2899idea to test your translation before submitting it. 2900 2901@node Libraries 2902@section Preparing Library Sources 2903 2904When you are preparing a library, not a program, for the use of 2905@code{gettext}, only a few details are different. Here we assume that 2906the library has a translation domain and a POT file of its own. (If 2907it uses the translation domain and POT file of the main program, then 2908the previous sections apply without changes.) 2909 2910@enumerate 2911@item 2912The library code doesn't call @code{setlocale (LC_ALL, "")}. It's the 2913responsibility of the main program to set the locale. The library's 2914documentation should mention this fact, so that developers of programs 2915using the library are aware of it. 2916 2917@item 2918The library code doesn't call @code{textdomain (PACKAGE)}, because it 2919would interfere with the text domain set by the main program. 2920 2921@item 2922The initialization code for a program was 2923 2924@smallexample 2925 setlocale (LC_ALL, ""); 2926 bindtextdomain (PACKAGE, LOCALEDIR); 2927 textdomain (PACKAGE); 2928@end smallexample 2929 2930@noindent 2931For a library it is reduced to 2932 2933@smallexample 2934 bindtextdomain (PACKAGE, LOCALEDIR); 2935@end smallexample 2936 2937@noindent 2938If your library's API doesn't already have an initialization function, 2939you need to create one, containing at least the @code{bindtextdomain} 2940invocation. However, you usually don't need to export and document this 2941initialization function: It is sufficient that all entry points of the 2942library call the initialization function if it hasn't been called before. 2943The typical idiom used to achieve this is a static boolean variable that 2944indicates whether the initialization function has been called. Like this: 2945 2946@example 2947@group 2948static bool libfoo_initialized; 2949 2950static void 2951libfoo_initialize (void) 2952@{ 2953 bindtextdomain (PACKAGE, LOCALEDIR); 2954 libfoo_initialized = true; 2955@} 2956 2957/* This function is part of the exported API. */ 2958struct foo * 2959create_foo (...) 2960@{ 2961 /* Must ensure the initialization is performed. */ 2962 if (!libfoo_initialized) 2963 libfoo_initialize (); 2964 ... 2965@} 2966 2967/* This function is part of the exported API. The argument must be 2968 non-NULL and have been created through create_foo(). */ 2969int 2970foo_refcount (struct foo *argument) 2971@{ 2972 /* No need to invoke the initialization function here, because 2973 create_foo() must already have been called before. */ 2974 ... 2975@} 2976@end group 2977@end example 2978 2979@item 2980The usual declaration of the @samp{_} macro in each source file was 2981 2982@smallexample 2983#include <libintl.h> 2984#define _(String) gettext (String) 2985@end smallexample 2986 2987@noindent 2988for a program. For a library, which has its own translation domain, 2989it reads like this: 2990 2991@smallexample 2992#include <libintl.h> 2993#define _(String) dgettext (PACKAGE, String) 2994@end smallexample 2995 2996In other words, @code{dgettext} is used instead of @code{gettext}. 2997Similarly, the @code{dngettext} function should be used in place of the 2998@code{ngettext} function. 2999@end enumerate 3000 3001@node Template 3002@chapter Making the PO Template File 3003@cindex PO template file 3004 3005After preparing the sources, the programmer creates a PO template file. 3006This section explains how to use @code{xgettext} for this purpose. 3007 3008@code{xgettext} creates a file named @file{@var{domainname}.po}. You 3009should then rename it to @file{@var{domainname}.pot}. (Why doesn't 3010@code{xgettext} create it under the name @file{@var{domainname}.pot} 3011right away? The answer is: for historical reasons. When @code{xgettext} 3012was specified, the distinction between a PO file and PO file template 3013was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.) 3014 3015@c FIXME: Rewrite. 3016 3017@menu 3018* xgettext Invocation:: Invoking the @code{xgettext} Program 3019@end menu 3020 3021@node xgettext Invocation 3022@section Invoking the @code{xgettext} Program 3023 3024@include xgettext.texi 3025 3026@node Creating 3027@chapter Creating a New PO File 3028@cindex creating a new PO file 3029 3030When starting a new translation, the translator creates a file called 3031@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template 3032file with modifications in the initial comments (at the beginning of the file) 3033and in the header entry (the first entry, near the beginning of the file). 3034 3035The easiest way to do so is by use of the @samp{msginit} program. 3036For example: 3037 3038@example 3039$ cd @var{PACKAGE}-@var{VERSION} 3040$ cd po 3041$ msginit 3042@end example 3043 3044The alternative way is to do the copy and modifications by hand. 3045To do so, the translator copies @file{@var{package}.pot} to 3046@file{@var{LANG}.po}. Then she modifies the initial comments and 3047the header entry of this file. 3048 3049@menu 3050* msginit Invocation:: Invoking the @code{msginit} Program 3051* Header Entry:: Filling in the Header Entry 3052@end menu 3053 3054@node msginit Invocation 3055@section Invoking the @code{msginit} Program 3056 3057@include msginit.texi 3058 3059@node Header Entry 3060@section Filling in the Header Entry 3061@cindex header entry of a PO file 3062 3063The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and 3064"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible 3065information. This can be done in any text editor; if Emacs is used 3066and it switched to PO mode automatically (because it has recognized 3067the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}. 3068 3069Modifying the header entry can already be done using PO mode: in Emacs, 3070type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the 3071entry. You should fill in the following fields. 3072 3073@table @asis 3074@item Project-Id-Version 3075This is the name and version of the package. Fill it in if it has not 3076already been filled in by @code{xgettext}. 3077 3078@item Report-Msgid-Bugs-To 3079This has already been filled in by @code{xgettext}. It contains an email 3080address or URL where you can report bugs in the untranslated strings: 3081 3082@itemize - 3083@item Strings which are not entire sentences, see the maintainer guidelines 3084in @ref{Preparing Strings}. 3085@item Strings which use unclear terms or require additional context to be 3086understood. 3087@item Strings which make invalid assumptions about notation of date, time or 3088money. 3089@item Pluralisation problems. 3090@item Incorrect English spelling. 3091@item Incorrect formatting. 3092@end itemize 3093 3094@item POT-Creation-Date 3095This has already been filled in by @code{xgettext}. 3096 3097@item PO-Revision-Date 3098You don't need to fill this in. It will be filled by the PO file editor 3099when you save the file. 3100 3101@item Last-Translator 3102Fill in your name and email address (without double quotes). 3103 3104@item Language-Team 3105Fill in the English name of the language, and the email address or 3106homepage URL of the language team you are part of. 3107 3108Before starting a translation, it is a good idea to get in touch with 3109your translation team, not only to make sure you don't do duplicated work, 3110but also to coordinate difficult linguistic issues. 3111 3112@cindex list of translation teams, where to find 3113In the Free Translation Project, each translation team has its own mailing 3114list. The up-to-date list of teams can be found at the Free Translation 3115Project's homepage, @uref{https://translationproject.org/}, in the "Teams" 3116area. 3117 3118@item Language 3119@c The purpose of this field is to make it possible to automatically 3120@c - convert PO files to translation memory, 3121@c - initialize a spell checker based on the PO file, 3122@c - perform language specific checks. 3123Fill in the language code of the language. This can be in one of three 3124forms: 3125 3126@itemize - 3127@item 3128@samp{@var{ll}}, an @w{ISO 639} two-letter language code (lowercase). 3129See @ref{Language Codes} for the list of codes. 3130 3131@item 3132@samp{@var{ll}_@var{CC}}, where @samp{@var{ll}} is an @w{ISO 639} two-letter 3133language code (lowercase) and @samp{@var{CC}} is an @w{ISO 3166} two-letter 3134country code (uppercase). The country code specification is not redundant: 3135Some languages have dialects in different countries. For example, 3136@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil. The country 3137code serves to distinguish the dialects. See @ref{Language Codes} and 3138@ref{Country Codes} for the lists of codes. 3139 3140@item 3141@samp{@var{ll}_@var{CC}@@@var{variant}}, where @samp{@var{ll}} is an 3142@w{ISO 639} two-letter language code (lowercase), @samp{@var{CC}} is an 3143@w{ISO 3166} two-letter country code (uppercase), and @samp{@var{variant}} is 3144a variant designator. The variant designator (lowercase) can be a script 3145designator, such as @samp{latin} or @samp{cyrillic}. 3146@end itemize 3147 3148The naming convention @samp{@var{ll}_@var{CC}} is also the way locales are 3149named on systems based on GNU libc. But there are three important differences: 3150 3151@itemize @bullet 3152@item 3153In this PO file field, but not in locale names, @samp{@var{ll}_@var{CC}} 3154combinations denoting a language's main dialect are abbreviated as 3155@samp{@var{ll}}. For example, @samp{de} is equivalent to @samp{de_DE} 3156(German as spoken in Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as 3157spoken in Portugal) in this context. 3158 3159@item 3160In this PO file field, suffixes like @samp{.@var{encoding}} are not used. 3161 3162@item 3163In this PO file field, variant designators that are not relevant to message 3164translation, such as @samp{@@euro}, are not used. 3165@end itemize 3166 3167So, if your locale name is @samp{de_DE.UTF-8}, the language specification in 3168PO files is just @samp{de}. 3169 3170@item Content-Type 3171@cindex encoding of PO files 3172@cindex charset of PO files 3173Replace @samp{CHARSET} with the character encoding used for your language, 3174in your locale, or UTF-8. This field is needed for correct operation of the 3175@code{msgmerge} and @code{msgfmt} programs, as well as for users whose 3176locale's character encoding differs from yours (see @ref{Charset conversion}). 3177 3178@cindex @code{locale} program 3179You get the character encoding of your locale by running the shell command 3180@samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968}, 3181which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your 3182locale is not correctly configured. In this case, ask your translation 3183team which charset to use. @samp{ASCII} is not usable for any language 3184except Latin. 3185 3186@cindex encoding list 3187Because the PO files must be portable to operating systems with less advanced 3188internationalization facilities, the character encodings that can be used 3189are limited to those supported by both GNU @code{libc} and GNU 3190@code{libiconv}. These are: 3191@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3}, 3192@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7}, 3193@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14}, 3194@code{ISO-8859-15}, 3195@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T}, 3196@code{CP850}, @code{CP866}, @code{CP874}, 3197@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251}, 3198@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256}, 3199@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW}, 3200@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS}, 3201@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}. 3202 3203@c This data is taken from glibc/localedata/SUPPORTED. 3204@cindex Linux 3205In the GNU system, the following encodings are frequently used for the 3206corresponding languages. 3207 3208@cindex encoding for your language 3209@itemize 3210@item @code{ISO-8859-1} for 3211Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, 3212English, Estonian, Faroese, Finnish, French, Galician, German, 3213Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx, 3214Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek, 3215Walloon, 3216@item @code{ISO-8859-2} for 3217Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, 3218Slovenian, 3219@item @code{ISO-8859-3} for Maltese, 3220@item @code{ISO-8859-5} for Macedonian, Serbian, 3221@item @code{ISO-8859-6} for Arabic, 3222@item @code{ISO-8859-7} for Greek, 3223@item @code{ISO-8859-8} for Hebrew, 3224@item @code{ISO-8859-9} for Turkish, 3225@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori, 3226@item @code{ISO-8859-14} for Welsh, 3227@item @code{ISO-8859-15} for 3228Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish, 3229Italian, Portuguese, Spanish, Swedish, Walloon, 3230@item @code{KOI8-R} for Russian, 3231@item @code{KOI8-U} for Ukrainian, 3232@item @code{KOI8-T} for Tajik, 3233@item @code{CP1251} for Bulgarian, Belarusian, 3234@item @code{GB2312}, @code{GBK}, @code{GB18030} 3235for simplified writing of Chinese, 3236@item @code{BIG5}, @code{BIG5-HKSCS} 3237for traditional writing of Chinese, 3238@item @code{EUC-JP} for Japanese, 3239@item @code{EUC-KR} for Korean, 3240@item @code{TIS-620} for Thai, 3241@item @code{GEORGIAN-PS} for Georgian, 3242@item @code{UTF-8} for any language, including those listed above. 3243@end itemize 3244 3245@cindex quote characters, use in PO files 3246@cindex quotation marks 3247When single quote characters or double quote characters are used in 3248translations for your language, and your locale's encoding is one of the 3249ISO-8859-* charsets, it is best if you create your PO files in UTF-8 3250encoding, instead of your locale's encoding. This is because in UTF-8 3251the real quote characters can be represented (single quote characters: 3252U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of 3253ISO-8859-* charsets has them all. Users in UTF-8 locales will see the 3254real quote characters, whereas users in ISO-8859-* locales will see the 3255vertical apostrophe and the vertical double quote instead (because that's 3256what the character set conversion will transliterate them to). 3257 3258@cindex @code{xmodmap} program, and typing quotation marks 3259To enter such quote characters under X11, you can change your keyboard 3260mapping using the @code{xmodmap} program. The X11 names of the quote 3261characters are "leftsinglequotemark", "rightsinglequotemark", 3262"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark", 3263"doublelowquotemark". 3264 3265Note that only recent versions of GNU Emacs support the UTF-8 encoding: 3266Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't 3267support the UTF-8 encoding. 3268 3269The character encoding name can be written in either upper or lower case. 3270Usually upper case is preferred. 3271 3272@item Content-Transfer-Encoding 3273Set this to @code{8bit}. 3274 3275@item Plural-Forms 3276This field is optional. It is only needed if the PO file has plural forms. 3277You can find them by searching for the @samp{msgid_plural} keyword. The 3278format of the plural forms field is described in @ref{Plural forms} and 3279@ref{Translating plural forms}. 3280@end table 3281 3282@node Updating 3283@chapter Updating Existing PO Files 3284 3285@menu 3286* msgmerge Invocation:: Invoking the @code{msgmerge} Program 3287@end menu 3288 3289@node msgmerge Invocation 3290@section Invoking the @code{msgmerge} Program 3291 3292@include msgmerge.texi 3293 3294@node Editing 3295@chapter Editing PO Files 3296@cindex Editing PO Files 3297 3298@menu 3299* KBabel:: KDE's PO File Editor 3300* Gtranslator:: GNOME's PO File Editor 3301* PO Mode:: Emacs's PO File Editor 3302* Compendium:: Using Translation Compendia 3303@end menu 3304 3305@node KBabel 3306@section KDE's PO File Editor 3307@cindex KDE PO file editor 3308 3309@node Gtranslator 3310@section GNOME's PO File Editor 3311@cindex GNOME PO file editor 3312 3313@node PO Mode 3314@section Emacs's PO File Editor 3315@cindex Emacs PO Mode 3316 3317@c FIXME: Rewrite. 3318 3319For those of you being 3320the lucky users of Emacs, PO mode has been specifically created 3321for providing a cozy environment for editing or modifying PO files. 3322While editing a PO file, PO mode allows for the easy browsing of 3323auxiliary and compendium PO files, as well as for following references into 3324the set of C program sources from which PO files have been derived. 3325It has a few special features, among which are the interactive marking 3326of program strings as translatable, and the validation of PO files 3327with easy repositioning to PO file lines showing errors. 3328 3329For the beginning, besides main PO mode commands 3330(@pxref{Main PO Commands}), you should know how to move between entries 3331(@pxref{Entry Positioning}), and how to handle untranslated entries 3332(@pxref{Untranslated Entries}). 3333 3334@menu 3335* Installation:: Completing GNU @code{gettext} Installation 3336* Main PO Commands:: Main Commands 3337* Entry Positioning:: Entry Positioning 3338* Normalizing:: Normalizing Strings in Entries 3339* Translated Entries:: Translated Entries 3340* Fuzzy Entries:: Fuzzy Entries 3341* Untranslated Entries:: Untranslated Entries 3342* Obsolete Entries:: Obsolete Entries 3343* Modifying Translations:: Modifying Translations 3344* Modifying Comments:: Modifying Comments 3345* Subedit:: Mode for Editing Translations 3346* C Sources Context:: C Sources Context 3347* Auxiliary:: Consulting Auxiliary PO Files 3348@end menu 3349 3350@node Installation 3351@subsection Completing GNU @code{gettext} Installation 3352 3353@cindex installing @code{gettext} 3354@cindex @code{gettext} installation 3355Once you have received, unpacked, configured and compiled the GNU 3356@code{gettext} distribution, the @samp{make install} command puts in 3357place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and 3358@code{msgmerge}, as well as their available message catalogs. To 3359top off a comfortable installation, you might also want to make the 3360PO mode available to your Emacs users. 3361 3362@emindex @file{.emacs} customizations 3363@emindex installing PO mode 3364During the installation of the PO mode, you might want to modify your 3365file @file{.emacs}, once and for all, so it contains a few lines looking 3366like: 3367 3368@example 3369(setq auto-mode-alist 3370 (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) 3371(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t) 3372@end example 3373 3374Later, whenever you edit some @file{.po} 3375file, or any file having the string @samp{.po.} within its name, 3376Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and 3377automatically activates PO mode commands for the associated buffer. 3378The string @emph{PO} appears in the mode line for any buffer for 3379which PO mode is active. Many PO files may be active at once in a 3380single Emacs session. 3381 3382If you are using Emacs version 20 or newer, and have already installed 3383the appropriate international fonts on your system, you may also tell 3384Emacs how to determine automatically the coding system of every PO file. 3385This will often (but not always) cause the necessary fonts to be loaded 3386and used for displaying the translations on your Emacs screen. For this 3387to happen, add the lines: 3388 3389@example 3390(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." 3391 'po-find-file-coding-system) 3392(autoload 'po-find-file-coding-system "po-mode") 3393@end example 3394 3395@noindent 3396to your @file{.emacs} file. If, with this, you still see boxes instead 3397of international characters, try a different font set (via Shift Mouse 3398button 1). 3399 3400@node Main PO Commands 3401@subsection Main PO mode Commands 3402 3403@cindex PO mode (Emacs) commands 3404@emindex commands 3405After setting up Emacs with something similar to the lines in 3406@ref{Installation}, PO mode is activated for a window when Emacs finds a 3407PO file in that window. This puts the window read-only and establishes a 3408po-mode-map, which is a genuine Emacs mode, in a way that is not derived 3409from text mode in any way. Functions found on @code{po-mode-hook}, 3410if any, will be executed. 3411 3412When PO mode is active in a window, the letters @samp{PO} appear 3413in the mode line for that window. The mode line also displays how 3414many entries of each kind are held in the PO file. For example, 3415the string @samp{132t+3f+10u+2o} would tell the translator that the 3416PO mode contains 132 translated entries (@pxref{Translated Entries}, 34173 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries 3418(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete 3419Entries}). Zero-coefficients items are not shown. So, in this example, if 3420the fuzzy entries were unfuzzied, the untranslated entries were translated 3421and the obsolete entries were deleted, the mode line would merely display 3422@samp{145t} for the counters. 3423 3424The main PO commands are those which do not fit into the other categories of 3425subsequent sections. These allow for quitting PO mode or for managing windows 3426in special ways. 3427 3428@table @kbd 3429@item _ 3430@efindex _@r{, PO Mode command} 3431Undo last modification to the PO file (@code{po-undo}). 3432 3433@item Q 3434@efindex Q@r{, PO Mode command} 3435Quit processing and save the PO file (@code{po-quit}). 3436 3437@item q 3438@efindex q@r{, PO Mode command} 3439Quit processing, possibly after confirmation (@code{po-confirm-and-quit}). 3440 3441@item 0 3442@efindex 0@r{, PO Mode command} 3443Temporary leave the PO file window (@code{po-other-window}). 3444 3445@item ? 3446@itemx h 3447@efindex ?@r{, PO Mode command} 3448@efindex h@r{, PO Mode command} 3449Show help about PO mode (@code{po-help}). 3450 3451@item = 3452@efindex =@r{, PO Mode command} 3453Give some PO file statistics (@code{po-statistics}). 3454 3455@item V 3456@efindex V@r{, PO Mode command} 3457Batch validate the format of the whole PO file (@code{po-validate}). 3458 3459@end table 3460 3461@efindex _@r{, PO Mode command} 3462@efindex po-undo@r{, PO Mode command} 3463The command @kbd{_} (@code{po-undo}) interfaces to the Emacs 3464@emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs 3465Editor}. Each time @kbd{_} is typed, modifications which the translator 3466did to the PO file are undone a little more. For the purpose of 3467undoing, each PO mode command is atomic. This is especially true for 3468the @kbd{@key{RET}} command: the whole edition made by using a single 3469use of this command is undone at once, even if the edition itself 3470implied several actions. However, while in the editing window, one 3471can undo the edition work quite parsimoniously. 3472 3473@efindex Q@r{, PO Mode command} 3474@efindex q@r{, PO Mode command} 3475@efindex po-quit@r{, PO Mode command} 3476@efindex po-confirm-and-quit@r{, PO Mode command} 3477The commands @kbd{Q} (@code{po-quit}) and @kbd{q} 3478(@code{po-confirm-and-quit}) are used when the translator is done with the 3479PO file. The former is a bit less verbose than the latter. If the file 3480has been modified, it is saved to disk first. In both cases, and prior to 3481all this, the commands check if any untranslated messages remain in the 3482PO file and, if so, the translator is asked if she really wants to leave 3483off working with this PO file. This is the preferred way of getting rid 3484of an Emacs PO file buffer. Merely killing it through the usual command 3485@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed. 3486 3487@efindex 0@r{, PO Mode command} 3488@efindex po-other-window@r{, PO Mode command} 3489The command @kbd{0} (@code{po-other-window}) is another, softer way, 3490to leave PO mode, temporarily. It just moves the cursor to some other 3491Emacs window, and pops one if necessary. For example, if the translator 3492just got PO mode to show some source context in some other, she might 3493discover some apparent bug in the program source that needs correction. 3494This command allows the translator to change sex, become a programmer, 3495and have the cursor right into the window containing the program she 3496(or rather @emph{he}) wants to modify. By later getting the cursor back 3497in the PO file window, or by asking Emacs to edit this file once again, 3498PO mode is then recovered. 3499 3500@efindex ?@r{, PO Mode command} 3501@efindex h@r{, PO Mode command} 3502@efindex po-help@r{, PO Mode command} 3503The command @kbd{h} (@code{po-help}) displays a summary of all available PO 3504mode commands. The translator should then type any character to resume 3505normal PO mode operations. The command @kbd{?} has the same effect 3506as @kbd{h}. 3507 3508@efindex =@r{, PO Mode command} 3509@efindex po-statistics@r{, PO Mode command} 3510The command @kbd{=} (@code{po-statistics}) computes the total number of 3511entries in the PO file, the ordinal of the current entry (counted from 35121), the number of untranslated entries, the number of obsolete entries, 3513and displays all these numbers. 3514 3515@efindex V@r{, PO Mode command} 3516@efindex po-validate@r{, PO Mode command} 3517The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in 3518checking and verbose 3519mode over the current PO file. This command first offers to save the 3520current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext}, 3521has the purpose of creating a MO file out of a PO file, and PO mode uses 3522the features of this program for checking the overall format of a PO file, 3523as well as all individual entries. 3524 3525@efindex next-error@r{, stepping through PO file validation results} 3526The program @code{msgfmt} runs asynchronously with Emacs, so the 3527translator regains control immediately while her PO file is being studied. 3528Error output is collected in the Emacs @samp{*compilation*} buffer, 3529displayed in another window. The regular Emacs command @kbd{C-x`} 3530(@code{next-error}), as well as other usual compile commands, allow the 3531translator to reposition quickly to the offending parts of the PO file. 3532Once the cursor is on the line in error, the translator may decide on 3533any PO mode action which would help correcting the error. 3534 3535@node Entry Positioning 3536@subsection Entry Positioning 3537 3538@emindex current entry of a PO file 3539The cursor in a PO file window is almost always part of 3540an entry. The only exceptions are the special case when the cursor 3541is after the last entry in the file, or when the PO file is 3542empty. The entry where the cursor is found to be is said to be the 3543current entry. Many PO mode commands operate on the current entry, 3544so moving the cursor does more than allowing the translator to browse 3545the PO file, this also selects on which entry commands operate. 3546 3547@emindex moving through a PO file 3548Some PO mode commands alter the position of the cursor in a specialized 3549way. A few of those special purpose positioning are described here, 3550the others are described in following sections (for a complete list try 3551@kbd{C-h m}): 3552 3553@table @kbd 3554 3555@item . 3556@efindex .@r{, PO Mode command} 3557Redisplay the current entry (@code{po-current-entry}). 3558 3559@item n 3560@efindex n@r{, PO Mode command} 3561Select the entry after the current one (@code{po-next-entry}). 3562 3563@item p 3564@efindex p@r{, PO Mode command} 3565Select the entry before the current one (@code{po-previous-entry}). 3566 3567@item < 3568@efindex <@r{, PO Mode command} 3569Select the first entry in the PO file (@code{po-first-entry}). 3570 3571@item > 3572@efindex >@r{, PO Mode command} 3573Select the last entry in the PO file (@code{po-last-entry}). 3574 3575@item m 3576@efindex m@r{, PO Mode command} 3577Record the location of the current entry for later use 3578(@code{po-push-location}). 3579 3580@item r 3581@efindex r@r{, PO Mode command} 3582Return to a previously saved entry location (@code{po-pop-location}). 3583 3584@item x 3585@efindex x@r{, PO Mode command} 3586Exchange the current entry location with the previously saved one 3587(@code{po-exchange-location}). 3588 3589@end table 3590 3591@efindex .@r{, PO Mode command} 3592@efindex po-current-entry@r{, PO Mode command} 3593Any Emacs command able to reposition the cursor may be used 3594to select the current entry in PO mode, including commands which 3595move by characters, lines, paragraphs, screens or pages, and search 3596commands. However, there is a kind of standard way to display the 3597current entry in PO mode, which usual Emacs commands moving 3598the cursor do not especially try to enforce. The command @kbd{.} 3599(@code{po-current-entry}) has the sole purpose of redisplaying the 3600current entry properly, after the current entry has been changed by 3601means external to PO mode, or the Emacs screen otherwise altered. 3602 3603It is yet to be decided if PO mode helps the translator, or otherwise 3604irritates her, by forcing a rigid window disposition while she 3605is doing her work. We originally had quite precise ideas about 3606how windows should behave, but on the other hand, anyone used to 3607Emacs is often happy to keep full control. Maybe a fixed window 3608disposition might be offered as a PO mode option that the translator 3609might activate or deactivate at will, so it could be offered on an 3610experimental basis. If nobody feels a real need for using it, or 3611a compulsion for writing it, we should drop this whole idea. 3612The incentive for doing it should come from translators rather than 3613programmers, as opinions from an experienced translator are surely 3614more worth to me than opinions from programmers @emph{thinking} about 3615how @emph{others} should do translation. 3616 3617@efindex n@r{, PO Mode command} 3618@efindex po-next-entry@r{, PO Mode command} 3619@efindex p@r{, PO Mode command} 3620@efindex po-previous-entry@r{, PO Mode command} 3621The commands @kbd{n} (@code{po-next-entry}) and @kbd{p} 3622(@code{po-previous-entry}) move the cursor the entry following, 3623or preceding, the current one. If @kbd{n} is given while the 3624cursor is on the last entry of the PO file, or if @kbd{p} 3625is given while the cursor is on the first entry, no move is done. 3626 3627@efindex <@r{, PO Mode command} 3628@efindex po-first-entry@r{, PO Mode command} 3629@efindex >@r{, PO Mode command} 3630@efindex po-last-entry@r{, PO Mode command} 3631The commands @kbd{<} (@code{po-first-entry}) and @kbd{>} 3632(@code{po-last-entry}) move the cursor to the first entry, or last 3633entry, of the PO file. When the cursor is located past the last 3634entry in a PO file, most PO mode commands will return an error saying 3635@samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>} 3636have the special property of being able to work even when the cursor 3637is not into some PO file entry, and one may use them for nicely 3638correcting this situation. But even these commands will fail on a 3639truly empty PO file. There are development plans for the PO mode for it 3640to interactively fill an empty PO file from sources. @xref{Marking}. 3641 3642The translator may decide, before working at the translation of 3643a particular entry, that she needs to browse the remainder of the 3644PO file, maybe for finding the terminology or phraseology used 3645in related entries. She can of course use the standard Emacs idioms 3646for saving the current cursor location in some register, and use that 3647register for getting back, or else, use the location ring. 3648 3649@efindex m@r{, PO Mode command} 3650@efindex po-push-location@r{, PO Mode command} 3651@efindex r@r{, PO Mode command} 3652@efindex po-pop-location@r{, PO Mode command} 3653PO mode offers another approach, by which cursor locations may be saved 3654onto a special stack. The command @kbd{m} (@code{po-push-location}) 3655merely adds the location of current entry to the stack, pushing 3656the already saved locations under the new one. The command 3657@kbd{r} (@code{po-pop-location}) consumes the top stack element and 3658repositions the cursor to the entry associated with that top element. 3659This position is then lost, for the next @kbd{r} will move the cursor 3660to the previously saved location, and so on until no locations remain 3661on the stack. 3662 3663If the translator wants the position to be kept on the location stack, 3664maybe for taking a look at the entry associated with the top 3665element, then go elsewhere with the intent of getting back later, she 3666ought to use @kbd{m} immediately after @kbd{r}. 3667 3668@efindex x@r{, PO Mode command} 3669@efindex po-exchange-location@r{, PO Mode command} 3670The command @kbd{x} (@code{po-exchange-location}) simultaneously 3671repositions the cursor to the entry associated with the top element of 3672the stack of saved locations, and replaces that top element with the 3673location of the current entry before the move. Consequently, repeating 3674the @kbd{x} command toggles alternatively between two entries. 3675For achieving this, the translator will position the cursor on the 3676first entry, use @kbd{m}, then position to the second entry, and 3677merely use @kbd{x} for making the switch. 3678 3679@node Normalizing 3680@subsection Normalizing Strings in Entries 3681@cindex string normalization in entries 3682 3683There are many different ways for encoding a particular string into a 3684PO file entry, because there are so many different ways to split and 3685quote multi-line strings, and even, to represent special characters 3686by backslashed escaped sequences. Some features of PO mode rely on 3687the ability for PO mode to scan an already existing PO file for a 3688particular string encoded into the @code{msgid} field of some entry. 3689Even if PO mode has internally all the built-in machinery for 3690implementing this recognition easily, doing it fast is technically 3691difficult. To facilitate a solution to this efficiency problem, 3692we decided on a canonical representation for strings. 3693 3694A conventional representation of strings in a PO file is currently 3695under discussion, and PO mode experiments with a canonical representation. 3696Having both @code{xgettext} and PO mode converging towards a uniform 3697way of representing equivalent strings would be useful, as the internal 3698normalization needed by PO mode could be automatically satisfied 3699when using @code{xgettext} from GNU @code{gettext}. An explicit 3700PO mode normalization should then be only necessary for PO files 3701imported from elsewhere, or for when the convention itself evolves. 3702 3703So, for achieving normalization of at least the strings of a given 3704PO file needing a canonical representation, the following PO mode 3705command is available: 3706 3707@emindex string normalization in entries 3708@table @kbd 3709@item M-x po-normalize 3710@efindex po-normalize@r{, PO Mode command} 3711Tidy the whole PO file by making entries more uniform. 3712 3713@end table 3714 3715The special command @kbd{M-x po-normalize}, which has no associated 3716keys, revises all entries, ensuring that strings of both original 3717and translated entries use uniform internal quoting in the PO file. 3718It also removes any crumb after the last entry. This command may be 3719useful for PO files freshly imported from elsewhere, or if we ever 3720improve on the canonical quoting format we use. This canonical format 3721is not only meant for getting cleaner PO files, but also for greatly 3722speeding up @code{msgid} string lookup for some other PO mode commands. 3723 3724@kbd{M-x po-normalize} presently makes three passes over the entries. 3725The first implements heuristics for converting PO files for GNU 3726@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr} 3727fields were using K&R style C string syntax for multi-line strings. 3728These heuristics may fail for comments not related to obsolete 3729entries and ending with a backslash; they also depend on subsequent 3730passes for finalizing the proper commenting of continued lines for 3731obsolete entries. This first pass might disappear once all oldish PO 3732files would have been adjusted. The second and third pass normalize 3733all @code{msgid} and @code{msgstr} strings respectively. They also 3734clean out those trailing backslashes used by XView's @code{msgfmt} 3735for continued lines. 3736 3737@cindex importing PO files 3738Having such an explicit normalizing command allows for importing PO 3739files from other sources, but also eases the evolution of the current 3740convention, evolution driven mostly by aesthetic concerns, as of now. 3741It is easy to make suggested adjustments at a later time, as the 3742normalizing command and eventually, other GNU @code{gettext} tools 3743should greatly automate conformance. A description of the canonical 3744string format is given below, for the particular benefit of those not 3745having Emacs handy, and who would nevertheless want to handcraft 3746their PO files in nice ways. 3747 3748@cindex multi-line strings 3749Right now, in PO mode, strings are single line or multi-line. A string 3750goes multi-line if and only if it has @emph{embedded} newlines, that 3751is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have: 3752 3753@example 3754msgstr "\n\nHello, world!\n\n\n" 3755@end example 3756 3757but, replacing the space by a newline, this becomes: 3758 3759@example 3760msgstr "" 3761"\n" 3762"\n" 3763"Hello,\n" 3764"world!\n" 3765"\n" 3766"\n" 3767@end example 3768 3769We are deliberately using a caricatural example, here, to make the 3770point clearer. Usually, multi-lines are not that bad looking. 3771It is probable that we will implement the following suggestion. 3772We might lump together all initial newlines into the empty string, 3773and also all newlines introducing empty lines (that is, for @w{@var{n} 3774> 1}, the @var{n}-1'th last newlines would go together on a separate 3775string), so making the previous example appear: 3776 3777@example 3778msgstr "\n\n" 3779"Hello,\n" 3780"world!\n" 3781"\n\n" 3782@end example 3783 3784There are a few yet undecided little points about string normalization, 3785to be documented in this manual, once these questions settle. 3786 3787@node Translated Entries 3788@subsection Translated Entries 3789@cindex translated entries 3790 3791Each PO file entry for which the @code{msgstr} field has been filled with 3792a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}), 3793is said to be a @dfn{translated} entry. Only translated entries will 3794later be compiled by GNU @code{msgfmt} and become usable in programs. 3795Other entry types will be excluded; translation will not occur for them. 3796 3797@emindex moving by translated entries 3798Some commands are more specifically related to translated entry processing. 3799 3800@table @kbd 3801@item t 3802@efindex t@r{, PO Mode command} 3803Find the next translated entry (@code{po-next-translated-entry}). 3804 3805@item T 3806@efindex T@r{, PO Mode command} 3807Find the previous translated entry (@code{po-previous-translated-entry}). 3808 3809@end table 3810 3811@efindex t@r{, PO Mode command} 3812@efindex po-next-translated-entry@r{, PO Mode command} 3813@efindex T@r{, PO Mode command} 3814@efindex po-previous-translated-entry@r{, PO Mode command} 3815The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T} 3816(@code{po-previous-translated-entry}) move forwards or backwards, chasing 3817for an translated entry. If none is found, the search is extended and 3818wraps around in the PO file buffer. 3819 3820@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable} 3821Translated entries usually result from the translator having edited in 3822a translation for them, @ref{Modifying Translations}. However, if the 3823variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having 3824received a new translation first becomes a fuzzy entry, which ought to 3825be later unfuzzied before becoming an official, genuine translated entry. 3826@xref{Fuzzy Entries}. 3827 3828@node Fuzzy Entries 3829@subsection Fuzzy Entries 3830@cindex fuzzy entries 3831 3832@cindex attributes of a PO file entry 3833@cindex attribute, fuzzy 3834Each PO file entry may have a set of @dfn{attributes}, which are 3835qualities given a name and explicitly associated with the translation, 3836using a special system comment. One of these attributes 3837has the name @code{fuzzy}, and entries having this attribute are said 3838to have a fuzzy translation. They are called fuzzy entries, for short. 3839 3840Fuzzy entries, even if they account for translated entries for 3841most other purposes, usually call for revision by the translator. 3842Those may be produced by applying the program @code{msgmerge} to 3843update an older translated PO files according to a new PO template 3844file, when this tool hypothesises that some new @code{msgid} has 3845been modified only slightly out of an older one, and chooses to pair 3846what it thinks to be the old translation for the new modified entry. 3847The slight alteration in the original string (the @code{msgid} string) 3848should often be reflected in the translated string, and this requires 3849the intervention of the translator. For this reason, @code{msgmerge} 3850might mark some entries as being fuzzy. 3851 3852@emindex moving by fuzzy entries 3853Also, the translator may decide herself to mark an entry as fuzzy 3854for her own convenience, when she wants to remember that the entry 3855has to be later revisited. So, some commands are more specifically 3856related to fuzzy entry processing. 3857 3858@table @kbd 3859@item f 3860@efindex f@r{, PO Mode command} 3861@c better append "-entry" all the time. -ke- 3862Find the next fuzzy entry (@code{po-next-fuzzy-entry}). 3863 3864@item F 3865@efindex F@r{, PO Mode command} 3866Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}). 3867 3868@item @key{TAB} 3869@efindex TAB@r{, PO Mode command} 3870Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}). 3871 3872@end table 3873 3874@efindex f@r{, PO Mode command} 3875@efindex po-next-fuzzy-entry@r{, PO Mode command} 3876@efindex F@r{, PO Mode command} 3877@efindex po-previous-fuzzy-entry@r{, PO Mode command} 3878The commands @kbd{f} (@code{po-next-fuzzy-entry}) and @kbd{F} 3879(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for 3880a fuzzy entry. If none is found, the search is extended and wraps 3881around in the PO file buffer. 3882 3883@efindex TAB@r{, PO Mode command} 3884@efindex po-unfuzzy@r{, PO Mode command} 3885@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable} 3886The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy 3887attribute associated with an entry, usually leaving it translated. 3888Further, if the variable @code{po-auto-select-on-unfuzzy} has not 3889the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase 3890for another interesting entry to work on. The initial value of 3891@code{po-auto-select-on-unfuzzy} is @code{nil}. 3892 3893The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However, 3894if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry 3895edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to 3896ensure some kind of double check, later. In this case, the usual paradigm 3897is that an entry becomes fuzzy (if not already) whenever the translator 3898modifies it. If she is satisfied with the translation, she then uses 3899@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute 3900on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}} 3901to chase another entry, leaving the entry fuzzy. 3902 3903@efindex DEL@r{, PO Mode command} 3904@efindex po-fade-out-entry@r{, PO Mode command} 3905The translator may also use the @kbd{@key{DEL}} command 3906(@code{po-fade-out-entry}) over any translated entry to mark it as being 3907fuzzy, when she wants to easily leave a trace she wants to later return 3908working at this entry. 3909 3910Also, when time comes to quit working on a PO file buffer with the @kbd{q} 3911command, the translator is asked for confirmation, if fuzzy string 3912still exists. 3913 3914@node Untranslated Entries 3915@subsection Untranslated Entries 3916@cindex untranslated entries 3917 3918When @code{xgettext} originally creates a PO file, unless told 3919otherwise, it initializes the @code{msgid} field with the untranslated 3920string, and leaves the @code{msgstr} string to be empty. Such entries, 3921having an empty translation, are said to be @dfn{untranslated} entries. 3922Later, when the programmer slightly modifies some string right in 3923the program, this change is later reflected in the PO file 3924by the appearance of a new untranslated entry for the modified string. 3925 3926The usual commands moving from entry to entry consider untranslated 3927entries on the same level as active entries. Untranslated entries 3928are easily recognizable by the fact they end with @w{@samp{msgstr ""}}. 3929 3930@emindex moving by untranslated entries 3931The work of the translator might be (quite naively) seen as the process 3932of seeking for an untranslated entry, editing a translation for 3933it, and repeating these actions until no untranslated entries remain. 3934Some commands are more specifically related to untranslated entry 3935processing. 3936 3937@table @kbd 3938@item u 3939@efindex u@r{, PO Mode command} 3940Find the next untranslated entry (@code{po-next-untranslated-entry}). 3941 3942@item U 3943@efindex U@r{, PO Mode command} 3944Find the previous untranslated entry (@code{po-previous-untransted-entry}). 3945 3946@item k 3947@efindex k@r{, PO Mode command} 3948Turn the current entry into an untranslated one (@code{po-kill-msgstr}). 3949 3950@end table 3951 3952@efindex u@r{, PO Mode command} 3953@efindex po-next-untranslated-entry@r{, PO Mode command} 3954@efindex U@r{, PO Mode command} 3955@efindex po-previous-untransted-entry@r{, PO Mode command} 3956The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U} 3957(@code{po-previous-untransted-entry}) move forwards or backwards, 3958chasing for an untranslated entry. If none is found, the search is 3959extended and wraps around in the PO file buffer. 3960 3961@efindex k@r{, PO Mode command} 3962@efindex po-kill-msgstr@r{, PO Mode command} 3963An entry can be turned back into an untranslated entry by 3964merely emptying its translation, using the command @kbd{k} 3965(@code{po-kill-msgstr}). @xref{Modifying Translations}. 3966 3967Also, when time comes to quit working on a PO file buffer 3968with the @kbd{q} command, the translator is asked for confirmation, 3969if some untranslated string still exists. 3970 3971@node Obsolete Entries 3972@subsection Obsolete Entries 3973@cindex obsolete entries 3974 3975By @dfn{obsolete} PO file entries, we mean those entries which are 3976commented out, usually by @code{msgmerge} when it found that the 3977translation is not needed anymore by the package being localized. 3978 3979The usual commands moving from entry to entry consider obsolete 3980entries on the same level as active entries. Obsolete entries are 3981easily recognizable by the fact that all their lines start with 3982@code{#}, even those lines containing @code{msgid} or @code{msgstr}. 3983 3984Commands exist for emptying the translation or reinitializing it 3985to the original untranslated string. Commands interfacing with the 3986kill ring may force some previously saved text into the translation. 3987The user may interactively edit the translation. All these commands 3988may apply to obsolete entries, carefully leaving the entry obsolete 3989after the fact. 3990 3991@emindex moving by obsolete entries 3992Moreover, some commands are more specifically related to obsolete 3993entry processing. 3994 3995@table @kbd 3996@item o 3997@efindex o@r{, PO Mode command} 3998Find the next obsolete entry (@code{po-next-obsolete-entry}). 3999 4000@item O 4001@efindex O@r{, PO Mode command} 4002Find the previous obsolete entry (@code{po-previous-obsolete-entry}). 4003 4004@item @key{DEL} 4005@efindex DEL@r{, PO Mode command} 4006Make an active entry obsolete, or zap out an obsolete entry 4007(@code{po-fade-out-entry}). 4008 4009@end table 4010 4011@efindex o@r{, PO Mode command} 4012@efindex po-next-obsolete-entry@r{, PO Mode command} 4013@efindex O@r{, PO Mode command} 4014@efindex po-previous-obsolete-entry@r{, PO Mode command} 4015The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O} 4016(@code{po-previous-obsolete-entry}) move forwards or backwards, 4017chasing for an obsolete entry. If none is found, the search is 4018extended and wraps around in the PO file buffer. 4019 4020PO mode does not provide ways for un-commenting an obsolete entry 4021and making it active, because this would reintroduce an original 4022untranslated string which does not correspond to any marked string 4023in the program sources. This goes with the philosophy of never 4024introducing useless @code{msgid} values. 4025 4026@efindex DEL@r{, PO Mode command} 4027@efindex po-fade-out-entry@r{, PO Mode command} 4028@emindex obsolete active entry 4029@emindex comment out PO file entry 4030However, it is possible to comment out an active entry, so making 4031it obsolete. GNU @code{gettext} utilities will later react to the 4032disappearance of a translation by using the untranslated string. 4033The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry 4034a little further towards annihilation. If the entry is active (it is a 4035translated entry), then it is first made fuzzy. If it is already fuzzy, 4036then the entry is merely commented out, with confirmation. If the entry 4037is already obsolete, then it is completely deleted from the PO file. 4038It is easy to recycle the translation so deleted into some other PO file 4039entry, usually one which is untranslated. @xref{Modifying Translations}. 4040 4041Here is a quite interesting problem to solve for later development of 4042PO mode, for those nights you are not sleepy. The idea would be that 4043PO mode might become bright enough, one of these days, to make good 4044guesses at retrieving the most probable candidate, among all obsolete 4045entries, for initializing the translation of a newly appeared string. 4046I think it might be a quite hard problem to do this algorithmically, as 4047we have to develop good and efficient measures of string similarity. 4048Right now, PO mode completely lets the decision to the translator, 4049when the time comes to find the adequate obsolete translation, it 4050merely tries to provide handy tools for helping her to do so. 4051 4052@node Modifying Translations 4053@subsection Modifying Translations 4054@cindex editing translations 4055@emindex editing translations 4056 4057PO mode prevents direct modification of the PO file, by the usual 4058means Emacs gives for altering a buffer's contents. By doing so, 4059it pretends helping the translator to avoid little clerical errors 4060about the overall file format, or the proper quoting of strings, 4061as those errors would be easily made. Other kinds of errors are 4062still possible, but some may be caught and diagnosed by the batch 4063validation process, which the translator may always trigger by the 4064@kbd{V} command. For all other errors, the translator has to rely on 4065her own judgment, and also on the linguistic reports submitted to her 4066by the users of the translated package, having the same mother tongue. 4067 4068When the time comes to create a translation, correct an error diagnosed 4069mechanically or reported by a user, the translators have to resort to 4070using the following commands for modifying the translations. 4071 4072@table @kbd 4073@item @key{RET} 4074@efindex RET@r{, PO Mode command} 4075Interactively edit the translation (@code{po-edit-msgstr}). 4076 4077@item @key{LFD} 4078@itemx C-j 4079@efindex LFD@r{, PO Mode command} 4080@efindex C-j@r{, PO Mode command} 4081Reinitialize the translation with the original, untranslated string 4082(@code{po-msgid-to-msgstr}). 4083 4084@item k 4085@efindex k@r{, PO Mode command} 4086Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}). 4087 4088@item w 4089@efindex w@r{, PO Mode command} 4090Save the translation on the kill ring, without deleting it 4091(@code{po-kill-ring-save-msgstr}). 4092 4093@item y 4094@efindex y@r{, PO Mode command} 4095Replace the translation, taking the new from the kill ring 4096(@code{po-yank-msgstr}). 4097 4098@end table 4099 4100@efindex RET@r{, PO Mode command} 4101@efindex po-edit-msgstr@r{, PO Mode command} 4102The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs 4103window meant to edit in a new translation, or to modify an already existing 4104translation. The new window contains a copy of the translation taken from 4105the current PO file entry, all ready for edition, expunged of all quoting 4106marks, fully modifiable and with the complete extent of Emacs modifying 4107commands. When the translator is done with her modifications, she may use 4108@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted 4109results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit}, 4110for more information. 4111 4112@efindex LFD@r{, PO Mode command} 4113@efindex C-j@r{, PO Mode command} 4114@efindex po-msgid-to-msgstr@r{, PO Mode command} 4115The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or 4116reinitializes the translation with the original string. This command is 4117normally used when the translator wants to redo a fresh translation of 4118the original string, disregarding any previous work. 4119 4120@evindex po-auto-edit-with-msgid@r{, PO Mode variable} 4121It is possible to arrange so, whenever editing an untranslated 4122entry, the @kbd{@key{LFD}} command be automatically executed. If you set 4123@code{po-auto-edit-with-msgid} to @code{t}, the translation gets 4124initialised with the original string, in case none exists already. 4125The default value for @code{po-auto-edit-with-msgid} is @code{nil}. 4126 4127@emindex starting a string translation 4128In fact, whether it is best to start a translation with an empty 4129string, or rather with a copy of the original string, is a matter of 4130taste or habit. Sometimes, the source language and the 4131target language are so different that is simply best to start writing 4132on an empty page. At other times, the source and target languages 4133are so close that it would be a waste to retype a number of words 4134already being written in the original string. A translator may also 4135like having the original string right under her eyes, as she will 4136progressively overwrite the original text with the translation, even 4137if this requires some extra editing work to get rid of the original. 4138 4139@emindex cut and paste for translated strings 4140@efindex k@r{, PO Mode command} 4141@efindex po-kill-msgstr@r{, PO Mode command} 4142@efindex w@r{, PO Mode command} 4143@efindex po-kill-ring-save-msgstr@r{, PO Mode command} 4144The command @kbd{k} (@code{po-kill-msgstr}) merely empties the 4145translation string, so turning the entry into an untranslated 4146one. But while doing so, its previous contents is put apart in 4147a special place, known as the kill ring. The command @kbd{w} 4148(@code{po-kill-ring-save-msgstr}) has also the effect of taking a 4149copy of the translation onto the kill ring, but it otherwise leaves 4150the entry alone, and does @emph{not} remove the translation from the 4151entry. Both commands use exactly the Emacs kill ring, which is shared 4152between buffers, and which is well known already to Emacs lovers. 4153 4154The translator may use @kbd{k} or @kbd{w} many times in the course 4155of her work, as the kill ring may hold several saved translations. 4156From the kill ring, strings may later be reinserted in various 4157Emacs buffers. In particular, the kill ring may be used for moving 4158translation strings between different entries of a single PO file 4159buffer, or if the translator is handling many such buffers at once, 4160even between PO files. 4161 4162To facilitate exchanges with buffers which are not in PO mode, the 4163translation string put on the kill ring by the @kbd{k} command is fully 4164unquoted before being saved: external quotes are removed, multi-line 4165strings are concatenated, and backslash escaped sequences are turned 4166into their corresponding characters. In the special case of obsolete 4167entries, the translation is also uncommented prior to saving. 4168 4169@efindex y@r{, PO Mode command} 4170@efindex po-yank-msgstr@r{, PO Mode command} 4171The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the 4172translation of the current entry by a string taken from the kill ring. 4173Following Emacs terminology, we then say that the replacement 4174string is @dfn{yanked} into the PO file buffer. 4175@xref{Yanking, , , emacs, The Emacs Editor}. 4176The first time @kbd{y} is used, the translation receives the value of 4177the most recent addition to the kill ring. If @kbd{y} is typed once 4178again, immediately, without intervening keystrokes, the translation 4179just inserted is taken away and replaced by the second most recent 4180addition to the kill ring. By repeating @kbd{y} many times in a row, 4181the translator may travel along the kill ring for saved strings, 4182until she finds the string she really wanted. 4183 4184When a string is yanked into a PO file entry, it is fully and 4185automatically requoted for complying with the format PO files should 4186have. Further, if the entry is obsolete, PO mode then appropriately 4187push the inserted string inside comments. Once again, translators 4188should not burden themselves with quoting considerations besides, of 4189course, the necessity of the translated string itself respective to 4190the program using it. 4191 4192Note that @kbd{k} or @kbd{w} are not the only commands pushing strings 4193on the kill ring, as almost any PO mode command replacing translation 4194strings (or the translator comments) automatically saves the old string 4195on the kill ring. The main exceptions to this general rule are the 4196yanking commands themselves. 4197 4198@emindex using obsolete translations to make new entries 4199To better illustrate the operation of killing and yanking, let's 4200use an actual example, taken from a common situation. When the 4201programmer slightly modifies some string right in the program, his 4202change is later reflected in the PO file by the appearance 4203of a new untranslated entry for the modified string, and the fact 4204that the entry translating the original or unmodified string becomes 4205obsolete. In many cases, the translator might spare herself some work 4206by retrieving the unmodified translation from the obsolete entry, 4207then initializing the untranslated entry @code{msgstr} field with 4208this retrieved translation. Once this done, the obsolete entry is 4209not wanted anymore, and may be safely deleted. 4210 4211When the translator finds an untranslated entry and suspects that a 4212slight variant of the translation exists, she immediately uses @kbd{m} 4213to mark the current entry location, then starts chasing obsolete 4214entries with @kbd{o}, hoping to find some translation corresponding 4215to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command 4216for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills} 4217the translation, that is, pushes the translation on the kill ring. 4218Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y} 4219then @emph{yanks} the saved translation right into the @code{msgstr} 4220field. The translator is then free to use @kbd{@key{RET}} for fine 4221tuning the translation contents, and maybe to later use @kbd{u}, 4222then @kbd{m} again, for going on with the next untranslated string. 4223 4224When some sequence of keys has to be typed over and over again, the 4225translator may find it useful to become better acquainted with the Emacs 4226capability of learning these sequences and playing them back under request. 4227@xref{Keyboard Macros, , , emacs, The Emacs Editor}. 4228 4229@node Modifying Comments 4230@subsection Modifying Comments 4231@cindex editing comments in PO files 4232@emindex editing comments 4233 4234Any translation work done seriously will raise many linguistic 4235difficulties, for which decisions have to be made, and the choices 4236further documented. These documents may be saved within the 4237PO file in form of translator comments, which the translator 4238is free to create, delete, or modify at will. These comments may 4239be useful to herself when she returns to this PO file after a while. 4240 4241Comments not having whitespace after the initial @samp{#}, for example, 4242those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator 4243comments, they are exclusively created by other @code{gettext} tools. 4244So, the commands below will never alter such system added comments, 4245they are not meant for the translator to modify. @xref{PO Files}. 4246 4247The following commands are somewhat similar to those modifying translations, 4248so the general indications given for those apply here. @xref{Modifying 4249Translations}. 4250 4251@table @kbd 4252 4253@item # 4254@efindex #@r{, PO Mode command} 4255Interactively edit the translator comments (@code{po-edit-comment}). 4256 4257@item K 4258@efindex K@r{, PO Mode command} 4259Save the translator comments on the kill ring, and delete it 4260(@code{po-kill-comment}). 4261 4262@item W 4263@efindex W@r{, PO Mode command} 4264Save the translator comments on the kill ring, without deleting it 4265(@code{po-kill-ring-save-comment}). 4266 4267@item Y 4268@efindex Y@r{, PO Mode command} 4269Replace the translator comments, taking the new from the kill ring 4270(@code{po-yank-comment}). 4271 4272@end table 4273 4274These commands parallel PO mode commands for modifying the translation 4275strings, and behave much the same way as they do, except that they handle 4276this part of PO file comments meant for translator usage, rather 4277than the translation strings. So, if the descriptions given below are 4278slightly succinct, it is because the full details have already been given. 4279@xref{Modifying Translations}. 4280 4281@efindex #@r{, PO Mode command} 4282@efindex po-edit-comment@r{, PO Mode command} 4283The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window 4284containing a copy of the translator comments on the current PO file entry. 4285If there are no such comments, PO mode understands that the translator wants 4286to add a comment to the entry, and she is presented with an empty screen. 4287Comment marks (@code{#}) and the space following them are automatically 4288removed before edition, and reinstated after. For translator comments 4289pertaining to obsolete entries, the uncommenting and recommenting operations 4290are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}} 4291allow the translator to tell she is finished with editing the comment. 4292@xref{Subedit}, for further details. 4293 4294@evindex po-subedit-mode-hook@r{, PO Mode variable} 4295Functions found on @code{po-subedit-mode-hook}, if any, are executed after 4296the string has been inserted in the edit buffer. 4297 4298@efindex K@r{, PO Mode command} 4299@efindex po-kill-comment@r{, PO Mode command} 4300@efindex W@r{, PO Mode command} 4301@efindex po-kill-ring-save-comment@r{, PO Mode command} 4302@efindex Y@r{, PO Mode command} 4303@efindex po-yank-comment@r{, PO Mode command} 4304The command @kbd{K} (@code{po-kill-comment}) gets rid of all 4305translator comments, while saving those comments on the kill ring. 4306The command @kbd{W} (@code{po-kill-ring-save-comment}) takes 4307a copy of the translator comments on the kill ring, but leaves 4308them undisturbed in the current entry. The command @kbd{Y} 4309(@code{po-yank-comment}) completely replaces the translator comments 4310by a string taken at the front of the kill ring. When this command 4311is immediately repeated, the comments just inserted are withdrawn, 4312and replaced by other strings taken along the kill ring. 4313 4314On the kill ring, all strings have the same nature. There is no 4315distinction between @emph{translation} strings and @emph{translator 4316comments} strings. So, for example, let's presume the translator 4317has just finished editing a translation, and wants to create a new 4318translator comment to document why the previous translation was 4319not good, just to remember what was the problem. Foreseeing that she 4320will do that in her documentation, the translator may want to quote 4321the previous translation in her translator comments. To do so, she 4322may initialize the translator comments with the previous translation, 4323still at the head of the kill ring. Because editing already pushed the 4324previous translation on the kill ring, she merely has to type @kbd{M-w} 4325prior to @kbd{#}, and the previous translation will be right there, 4326all ready for being introduced by some explanatory text. 4327 4328On the other hand, presume there are some translator comments already 4329and that the translator wants to add to those comments, instead 4330of wholly replacing them. Then, she should edit the comment right 4331away with @kbd{#}. Once inside the editing window, she can use the 4332regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y} 4333(@code{yank-pop}) to get the previous translation where she likes. 4334 4335@node Subedit 4336@subsection Details of Sub Edition 4337@emindex subedit minor mode 4338 4339The PO subedit minor mode has a few peculiarities worth being described 4340in fuller detail. It installs a few commands over the usual editing set 4341of Emacs, which are described below. 4342 4343@table @kbd 4344@item C-c C-c 4345@efindex C-c C-c@r{, PO Mode command} 4346Complete edition (@code{po-subedit-exit}). 4347 4348@item C-c C-k 4349@efindex C-c C-k@r{, PO Mode command} 4350Abort edition (@code{po-subedit-abort}). 4351 4352@item C-c C-a 4353@efindex C-c C-a@r{, PO Mode command} 4354Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}). 4355 4356@end table 4357 4358@emindex exiting PO subedit 4359@efindex C-c C-c@r{, PO Mode command} 4360@efindex po-subedit-exit@r{, PO Mode command} 4361The window's contents represents a translation for a given message, 4362or a translator comment. The translator may modify this window to 4363her heart's content. Once this is done, the command @w{@kbd{C-c C-c}} 4364(@code{po-subedit-exit}) may be used to return the edited translation into 4365the PO file, replacing the original translation, even if it moved out of 4366sight or if buffers were switched. 4367 4368@efindex C-c C-k@r{, PO Mode command} 4369@efindex po-subedit-abort@r{, PO Mode command} 4370If the translator becomes unsatisfied with her translation or comment, 4371to the extent she prefers keeping what was existent prior to the 4372@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}} 4373(@code{po-subedit-abort}) to merely get rid of edition, while preserving 4374the original translation or comment. Another way would be for her to exit 4375normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the 4376whole effect of last edition. 4377 4378@efindex C-c C-a@r{, PO Mode command} 4379@efindex po-subedit-cycle-auxiliary@r{, PO Mode command} 4380The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary}) 4381allows for glancing through translations 4382already achieved in other languages, directly while editing the current 4383translation. This may be quite convenient when the translator is fluent 4384at many languages, but of course, only makes sense when such completed 4385auxiliary PO files are already available to her (@pxref{Auxiliary}). 4386 4387Functions found on @code{po-subedit-mode-hook}, if any, are executed after 4388the string has been inserted in the edit buffer. 4389 4390While editing her translation, the translator should pay attention to not 4391inserting unwanted @kbd{@key{RET}} (newline) characters at the end of 4392the translated string if those are not meant to be there, or to removing 4393such characters when they are required. Since these characters are not 4394visible in the editing buffer, they are easily introduced by mistake. 4395To help her, @kbd{@key{RET}} automatically puts the character @code{<} 4396at the end of the string being edited, but this @code{<} is not really 4397part of the string. On exiting the editing window with @w{@kbd{C-c C-c}}, 4398PO mode automatically removes such @kbd{<} and all whitespace added after 4399it. If the translator adds characters after the terminating @code{<}, it 4400looses its delimiting property and integrally becomes part of the string. 4401If she removes the delimiting @code{<}, then the edited string is taken 4402@emph{as is}, with all trailing newlines, even if invisible. Also, if 4403the translated string ought to end itself with a genuine @code{<}, then 4404the delimiting @code{<} may not be removed; so the string should appear, 4405in the editing window, as ending with two @code{<} in a row. 4406 4407@emindex editing multiple entries 4408When a translation (or a comment) is being edited, the translator may move 4409the cursor back into the PO file buffer and freely move to other entries, 4410browsing at will. If, with an edition pending, the translator wanders in the 4411PO file buffer, she may decide to start modifying another entry. Each entry 4412being edited has its own subedit buffer. It is possible to simultaneously 4413edit the translation @emph{and} the comment of a single entry, or to 4414edit entries in different PO files, all at once. Typing @kbd{@key{RET}} 4415on a field already being edited merely resumes that particular edit. Yet, 4416the translator should better be comfortable at handling many Emacs windows! 4417 4418@emindex pending subedits 4419Pending subedits may be completed or aborted in any order, regardless 4420of how or when they were started. When many subedits are pending and the 4421translator asks for quitting the PO file (with the @kbd{q} command), subedits 4422are automatically resumed one at a time, so she may decide for each of them. 4423 4424@node C Sources Context 4425@subsection C Sources Context 4426@emindex consulting program sources 4427@emindex looking at the source to aid translation 4428@emindex use the source, Luke 4429 4430PO mode is particularly powerful when used with PO files 4431created through GNU @code{gettext} utilities, as those utilities 4432insert special comments in the PO files they generate. 4433Some of these special comments relate the PO file entry to 4434exactly where the untranslated string appears in the program sources. 4435 4436When the translator gets to an untranslated entry, she is fairly 4437often faced with an original string which is not as informative as 4438it normally should be, being succinct, cryptic, or otherwise ambiguous. 4439Before choosing how to translate the string, she needs to understand 4440better what the string really means and how tight the translation has 4441to be. Most of the time, when problems arise, the only way left to make 4442her judgment is looking at the true program sources from where this 4443string originated, searching for surrounding comments the programmer 4444might have put in there, and looking around for helping clues of 4445@emph{any} kind. 4446 4447Surely, when looking at program sources, the translator will receive 4448more help if she is a fluent programmer. However, even if she is 4449not versed in programming and feels a little lost in C code, the 4450translator should not be shy at taking a look, once in a while. 4451It is most probable that she will still be able to find some of the 4452hints she needs. She will learn quickly to not feel uncomfortable 4453in program code, paying more attention to programmer's comments, 4454variable and function names (if he dared choosing them well), and 4455overall organization, than to the program code itself. 4456 4457@emindex find source fragment for a PO file entry 4458The following commands are meant to help the translator at getting 4459program source context for a PO file entry. 4460 4461@table @kbd 4462@item s 4463@efindex s@r{, PO Mode command} 4464Resume the display of a program source context, or cycle through them 4465(@code{po-cycle-source-reference}). 4466 4467@item M-s 4468@efindex M-s@r{, PO Mode command} 4469Display of a program source context selected by menu 4470(@code{po-select-source-reference}). 4471 4472@item S 4473@efindex S@r{, PO Mode command} 4474Add a directory to the search path for source files 4475(@code{po-consider-source-path}). 4476 4477@item M-S 4478@efindex M-S@r{, PO Mode command} 4479Delete a directory from the search path for source files 4480(@code{po-ignore-source-path}). 4481 4482@end table 4483 4484@efindex s@r{, PO Mode command} 4485@efindex po-cycle-source-reference@r{, PO Mode command} 4486@efindex M-s@r{, PO Mode command} 4487@efindex po-select-source-reference@r{, PO Mode command} 4488The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s} 4489(@code{po-select-source-reference}) both open another window displaying 4490some source program file, and already positioned in such a way that 4491it shows an actual use of the string to be translated. By doing 4492so, the command gives source program context for the string. But if 4493the entry has no source context references, or if all references 4494are unresolved along the search path for program sources, then the 4495command diagnoses this as an error. 4496 4497Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays 4498in the PO file window. If the translator really wants to 4499get into the program source window, she ought to do it explicitly, 4500maybe by using command @kbd{O}. 4501 4502When @kbd{s} is typed for the first time, or for a PO file entry which 4503is different of the last one used for getting source context, then the 4504command reacts by giving the first context available for this entry, 4505if any. If some context has already been recently displayed for the 4506current PO file entry, and the translator wandered off to do other 4507things, typing @kbd{s} again will merely resume, in another window, 4508the context last displayed. In particular, if the translator moved 4509the cursor away from the context in the source file, the command will 4510bring the cursor back to the context. By using @kbd{s} many times 4511in a row, with no other commands intervening, PO mode will cycle to 4512the next available contexts for this particular entry, getting back 4513to the first context once the last has been shown. 4514 4515The command @kbd{M-s} behaves differently. Instead of cycling through 4516references, it lets the translator choose a particular reference among 4517many, and displays that reference. It is best used with completion, 4518if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in 4519response to the question, she will be offered a menu of all possible 4520references, as a reminder of which are the acceptable answers. 4521This command is useful only where there are really many contexts 4522available for a single string to translate. 4523 4524@efindex S@r{, PO Mode command} 4525@efindex po-consider-source-path@r{, PO Mode command} 4526@efindex M-S@r{, PO Mode command} 4527@efindex po-ignore-source-path@r{, PO Mode command} 4528Program source files are usually found relative to where the PO 4529file stands. As a special provision, when this fails, the file is 4530also looked for, but relative to the directory immediately above it. 4531Those two cases take proper care of most PO files. However, it might 4532happen that a PO file has been moved, or is edited in a different 4533place than its normal location. When this happens, the translator 4534should tell PO mode in which directory normally sits the genuine PO 4535file. Many such directories may be specified, and all together, they 4536constitute what is called the @dfn{search path} for program sources. 4537The command @kbd{S} (@code{po-consider-source-path}) is used to interactively 4538enter a new directory at the front of the search path, and the command 4539@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion, 4540one of the directories she does not want anymore on the search path. 4541 4542@node Auxiliary 4543@subsection Consulting Auxiliary PO Files 4544@emindex consulting translations to other languages 4545 4546PO mode is able to help the knowledgeable translator, being fluent in 4547many languages, at taking advantage of translations already achieved 4548in other languages she just happens to know. It provides these other 4549language translations as additional context for her own work. Moreover, 4550it has features to ease the production of translations for many languages 4551at once, for translators preferring to work in this way. 4552 4553@cindex auxiliary PO file 4554@emindex auxiliary PO file 4555An @dfn{auxiliary} PO file is an existing PO file meant for the same 4556package the translator is working on, but targeted to a different mother 4557tongue language. Commands exist for declaring and handling auxiliary 4558PO files, and also for showing contexts for the entry under work. 4559 4560Here are the auxiliary file commands available in PO mode. 4561 4562@table @kbd 4563@item a 4564@efindex a@r{, PO Mode command} 4565Seek auxiliary files for another translation for the same entry 4566(@code{po-cycle-auxiliary}). 4567 4568@item C-c C-a 4569@efindex C-c C-a@r{, PO Mode command} 4570Switch to a particular auxiliary file (@code{po-select-auxiliary}). 4571 4572@item A 4573@efindex A@r{, PO Mode command} 4574Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}). 4575 4576@item M-A 4577@efindex M-A@r{, PO Mode command} 4578Remove this PO file from the list of auxiliary files 4579(@code{po-ignore-as-auxiliary}). 4580 4581@end table 4582 4583@efindex A@r{, PO Mode command} 4584@efindex po-consider-as-auxiliary@r{, PO Mode command} 4585@efindex M-A@r{, PO Mode command} 4586@efindex po-ignore-as-auxiliary@r{, PO Mode command} 4587Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current 4588PO file to the list of auxiliary files, while command @kbd{M-A} 4589(@code{po-ignore-as-auxiliary} just removes it. 4590 4591@efindex a@r{, PO Mode command} 4592@efindex po-cycle-auxiliary@r{, PO Mode command} 4593The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO 4594files, round-robin, searching for a translated entry in some other language 4595having an @code{msgid} field identical as the one for the current entry. 4596The found PO file, if any, takes the place of the current PO file in 4597the display (its window gets on top). Before doing so, the current PO 4598file is also made into an auxiliary file, if not already. So, @kbd{a} 4599in this newly displayed PO file will seek another PO file, and so on, 4600so repeating @kbd{a} will eventually yield back the original PO file. 4601 4602@efindex C-c C-a@r{, PO Mode command} 4603@efindex po-select-auxiliary@r{, PO Mode command} 4604The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator 4605for her choice of a particular auxiliary file, with completion, and 4606then switches to that selected PO file. The command also checks if 4607the selected file has an @code{msgid} field identical as the one for 4608the current entry, and if yes, this entry becomes current. Otherwise, 4609the cursor of the selected file is left undisturbed. 4610 4611For all this to work fully, auxiliary PO files will have to be normalized, 4612in that way that @code{msgid} fields should be written @emph{exactly} 4613the same way. It is possible to write @code{msgid} fields in various 4614ways for representing the same string, different writing would break the 4615proper behaviour of the auxiliary file commands of PO mode. This is not 4616expected to be much a problem in practice, as most existing PO files have 4617their @code{msgid} entries written by the same GNU @code{gettext} tools. 4618 4619@efindex normalize@r{, PO Mode command} 4620However, PO files initially created by PO mode itself, while marking 4621strings in source files, are normalised differently. So are PO 4622files resulting of the @samp{M-x normalize} command. Until these 4623discrepancies between PO mode and other GNU @code{gettext} tools get 4624fully resolved, the translator should stay aware of normalisation issues. 4625 4626@node Compendium 4627@section Using Translation Compendia 4628@emindex using translation compendia 4629 4630@cindex compendium 4631A @dfn{compendium} is a special PO file containing a set of 4632translations recurring in many different packages. The translator can 4633use gettext tools to build a new compendium, to add entries to her 4634compendium, and to initialize untranslated entries, or to update 4635already translated entries, from translations kept in the compendium. 4636 4637@menu 4638* Creating Compendia:: Merging translations for later use 4639* Using Compendia:: Using older translations if they fit 4640@end menu 4641 4642@node Creating Compendia 4643@subsection Creating Compendia 4644@cindex creating compendia 4645@cindex compendium, creating 4646 4647Basically every PO file consisting of translated entries only can be 4648declared as a valid compendium. Often the translator wants to have 4649special compendia; let's consider two cases: @cite{concatenating PO 4650files} and @cite{extracting a message subset from a PO file}. 4651 4652@subsubsection Concatenate PO Files 4653 4654@cindex concatenating PO files into a compendium 4655@cindex accumulating translations 4656To concatenate several valid PO files into one compendium file you can 4657use @samp{msgcomm} or @samp{msgcat} (the latter preferred): 4658 4659@example 4660msgcat -o compendium.po file1.po file2.po 4661@end example 4662 4663By default, @code{msgcat} will accumulate divergent translations 4664for the same string. Those occurrences will be marked as @code{fuzzy} 4665and highly visible decorated; calling @code{msgcat} on 4666@file{file1.po}: 4667 4668@example 4669#: src/hello.c:200 4670#, c-format 4671msgid "Report bugs to <%s>.\n" 4672msgstr "Comunicar `bugs' a <%s>.\n" 4673@end example 4674 4675@noindent 4676and @file{file2.po}: 4677 4678@example 4679#: src/bye.c:100 4680#, c-format 4681msgid "Report bugs to <%s>.\n" 4682msgstr "Comunicar \"bugs\" a <%s>.\n" 4683@end example 4684 4685@noindent 4686will result in: 4687 4688@example 4689#: src/hello.c:200 src/bye.c:100 4690#, fuzzy, c-format 4691msgid "Report bugs to <%s>.\n" 4692msgstr "" 4693"#-#-#-#-# file1.po #-#-#-#-#\n" 4694"Comunicar `bugs' a <%s>.\n" 4695"#-#-#-#-# file2.po #-#-#-#-#\n" 4696"Comunicar \"bugs\" a <%s>.\n" 4697@end example 4698 4699@noindent 4700The translator will have to resolve this ``conflict'' manually; she 4701has to decide whether the first or the second version is appropriate 4702(or provide a new translation), to delete the ``marker lines'', and 4703finally to remove the @code{fuzzy} mark. 4704 4705If the translator knows in advance the first found translation of a 4706message is always the best translation she can make use to the 4707@samp{--use-first} switch: 4708 4709@example 4710msgcat --use-first -o compendium.po file1.po file2.po 4711@end example 4712 4713A good compendium file must not contain @code{fuzzy} or untranslated 4714entries. If input files are ``dirty'' you must preprocess the input 4715files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}. 4716 4717@subsubsection Extract a Message Subset from a PO File 4718@cindex extracting parts of a PO file into a compendium 4719 4720Nobody wants to translate the same messages again and again; thus you 4721may wish to have a compendium file containing @file{getopt.c} messages. 4722 4723To extract a message subset (e.g., all @file{getopt.c} messages) from an 4724existing PO file into one compendium file you can use @samp{msggrep}: 4725 4726@example 4727msggrep --location src/getopt.c -o compendium.po file.po 4728@end example 4729 4730@node Using Compendia 4731@subsection Using Compendia 4732 4733You can use a compendium file to initialize a translation from scratch 4734or to update an already existing translation. 4735 4736@subsubsection Initialize a New Translation File 4737@cindex initialize translations from a compendium 4738 4739Since a PO file with translations does not exist the translator can 4740merely use @file{/dev/null} to fake the ``old'' translation file. 4741 4742@example 4743msgmerge --compendium compendium.po -o file.po /dev/null file.pot 4744@end example 4745 4746@subsubsection Update an Existing Translation File 4747@cindex update translations from a compendium 4748 4749Concatenate the compendium file(s) and the existing PO, merge the 4750result with the POT file and remove the obsolete entries (optional, 4751here done using @samp{msgattrib}): 4752 4753@example 4754msgcat --use-first -o update.po compendium1.po compendium2.po file.po 4755msgmerge update.po file.pot | msgattrib --no-obsolete > file.po 4756@end example 4757 4758@node Manipulating 4759@chapter Manipulating PO Files 4760@cindex manipulating PO files 4761 4762Sometimes it is necessary to manipulate PO files in a way that is better 4763performed automatically than by hand. GNU @code{gettext} includes a 4764complete set of tools for this purpose. 4765 4766@cindex merging two PO files 4767When merging two packages into a single package, the resulting POT file 4768will be the concatenation of the two packages' POT files. Thus the 4769maintainer must concatenate the two existing package translations into 4770a single translation catalog, for each language. This is best performed 4771using @samp{msgcat}. It is then the translators' duty to deal with any 4772possible conflicts that arose during the merge. 4773 4774@cindex encoding conversion 4775When a translator takes over the translation job from another translator, 4776but she uses a different character encoding in her locale, she will 4777convert the catalog to her character encoding. This is best done through 4778the @samp{msgconv} program. 4779 4780When a maintainer takes a source file with tagged messages from another 4781package, he should also take the existing translations for this source 4782file (and not let the translators do the same job twice). One way to do 4783this is through @samp{msggrep}, another is to create a POT file for 4784that source file and use @samp{msgmerge}. 4785 4786@cindex dialect 4787@cindex orthography 4788When a translator wants to adjust some translation catalog for a special 4789dialect or orthography --- for example, German as written in Switzerland 4790versus German as written in Germany --- she needs to apply some text 4791processing to every message in the catalog. The tool for doing this is 4792@samp{msgfilter}. 4793 4794Another use of @code{msgfilter} is to produce approximately the POT file for 4795which a given PO file was made. This can be done through a filter command 4796like @samp{msgfilter sed -e d | sed -e '/^# /d'}. Note that the original 4797POT file may have had different comments and different plural message counts, 4798that's why it's better to use the original POT file if available. 4799 4800@cindex checking of translations 4801When a translator wants to check her translations, for example according 4802to orthography rules or using a non-interactive spell checker, she can do 4803so using the @samp{msgexec} program. 4804 4805@cindex duplicate elimination 4806When third party tools create PO or POT files, sometimes duplicates cannot 4807be avoided. But the GNU @code{gettext} tools give an error when they 4808encounter duplicate msgids in the same file and in the same domain. 4809To merge duplicates, the @samp{msguniq} program can be used. 4810 4811@samp{msgcomm} is a more general tool for keeping or throwing away 4812duplicates, occurring in different files. 4813 4814@samp{msgcmp} can be used to check whether a translation catalog is 4815completely translated. 4816 4817@cindex attributes, manipulating 4818@samp{msgattrib} can be used to select and extract only the fuzzy 4819or untranslated messages of a translation catalog. 4820 4821@samp{msgen} is useful as a first step for preparing English translation 4822catalogs. It copies each message's msgid to its msgstr. 4823 4824Finally, for those applications where all these various programs are not 4825sufficient, a library @samp{libgettextpo} is provided that can be used to 4826write other specialized programs that process PO files. 4827 4828@menu 4829* msgcat Invocation:: Invoking the @code{msgcat} Program 4830* msgconv Invocation:: Invoking the @code{msgconv} Program 4831* msggrep Invocation:: Invoking the @code{msggrep} Program 4832* msgfilter Invocation:: Invoking the @code{msgfilter} Program 4833* msguniq Invocation:: Invoking the @code{msguniq} Program 4834* msgcomm Invocation:: Invoking the @code{msgcomm} Program 4835* msgcmp Invocation:: Invoking the @code{msgcmp} Program 4836* msgattrib Invocation:: Invoking the @code{msgattrib} Program 4837* msgen Invocation:: Invoking the @code{msgen} Program 4838* msgexec Invocation:: Invoking the @code{msgexec} Program 4839* Colorizing:: Highlighting parts of PO files 4840* Other tools:: Other tools for manipulating PO files 4841* libgettextpo:: Writing your own programs that process PO files 4842@end menu 4843 4844@node msgcat Invocation 4845@section Invoking the @code{msgcat} Program 4846 4847@include msgcat.texi 4848 4849@node msgconv Invocation 4850@section Invoking the @code{msgconv} Program 4851 4852@include msgconv.texi 4853 4854@node msggrep Invocation 4855@section Invoking the @code{msggrep} Program 4856 4857@include msggrep.texi 4858 4859@node msgfilter Invocation 4860@section Invoking the @code{msgfilter} Program 4861 4862@include msgfilter.texi 4863 4864@node msguniq Invocation 4865@section Invoking the @code{msguniq} Program 4866 4867@include msguniq.texi 4868 4869@node msgcomm Invocation 4870@section Invoking the @code{msgcomm} Program 4871 4872@include msgcomm.texi 4873 4874@node msgcmp Invocation 4875@section Invoking the @code{msgcmp} Program 4876 4877@include msgcmp.texi 4878 4879@node msgattrib Invocation 4880@section Invoking the @code{msgattrib} Program 4881 4882@include msgattrib.texi 4883 4884@node msgen Invocation 4885@section Invoking the @code{msgen} Program 4886 4887@include msgen.texi 4888 4889@node msgexec Invocation 4890@section Invoking the @code{msgexec} Program 4891 4892@include msgexec.texi 4893 4894@node Colorizing 4895@section Highlighting parts of PO files 4896 4897Translators are usually only interested in seeing the untranslated and 4898fuzzy messages of a PO file. Also, when a message is set fuzzy because 4899the msgid changed, they want to see the differences between the previous 4900msgid and the current one (especially if the msgid is long and only few 4901words in it have changed). Finally, it's always welcome to highlight the 4902different sections of a message in a PO file (comments, msgid, msgstr, etc.). 4903 4904Such highlighting is possible through the options @samp{--color} and 4905@samp{--style}. They are supported by all the programs that produce 4906a PO file on standard output, such as @code{msgcat}, @code{msgmerge}, 4907and @code{msgunfmt}. 4908 4909@menu 4910* The --color option:: Triggering colorized output 4911* The TERM variable:: The environment variable @code{TERM} 4912* The --style option:: The @code{--style} option 4913* Style rules:: Style rules for PO files 4914* Customizing less:: Customizing @code{less} for viewing PO files 4915@end menu 4916 4917@node The --color option 4918@subsection The @code{--color} option 4919 4920@opindex --color@r{, @code{msgcat} option} 4921The @samp{--color=@var{when}} option specifies under which conditions 4922colorized output should be generated. The @var{when} part can be one of 4923the following: 4924 4925@table @code 4926@item always 4927@itemx yes 4928The output will be colorized. 4929 4930@item never 4931@itemx no 4932The output will not be colorized. 4933 4934@item auto 4935@itemx tty 4936The output will be colorized if the output device is a tty, i.e.@: when the 4937output goes directly to a text screen or terminal emulator window. 4938 4939@item html 4940The output will be colorized and be in HTML format. 4941 4942@item test 4943This is a special value, understood only by the @code{msgcat} program. It 4944is explained in the next section (@ref{The TERM variable}). 4945@end table 4946 4947@noindent 4948@samp{--color} is equivalent to @samp{--color=yes}. The default is 4949@samp{--color=auto}. 4950 4951Thus, a command like @samp{msgcat vi.po} will produce colorized output 4952when called by itself in a command window. Whereas in a pipe, such as 4953@samp{msgcat vi.po | less -R}, it will not produce colorized output. To 4954get colorized output in this situation nevertheless, use the command 4955@samp{msgcat --color vi.po | less -R}. 4956 4957The @samp{--color=html} option will produce output that can be viewed in 4958a browser. This can be useful, for example, for Indic languages, 4959because the renderic of Indic scripts in browsers is usually better than 4960in terminal emulators. 4961 4962Note that the output produced with the @code{--color} option is @emph{not} 4963a valid PO file in itself. It contains additional terminal-specific escape 4964sequences or HTML tags. A PO file reader will give a syntax error when 4965confronted with such content. Except for the @samp{--color=html} case, 4966you therefore normally don't need to save output produced with the 4967@code{--color} option in a file. 4968 4969@node The TERM variable 4970@subsection The environment variable @code{TERM} 4971 4972@vindex TERM@r{, environment variable} 4973The environment variable @code{TERM} contains a identifier for the text 4974window's capabilities. You can get a detailed list of these cababilities 4975by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a 4976reference. 4977 4978When producing text with embedded color directives, @code{msgcat} looks 4979at the @code{TERM} variable. Text windows today typically support at least 49808 colors. Often, however, the text window supports 16 or more colors, 4981even though the @code{TERM} variable is set to a identifier denoting only 49828 supported colors. It can be worth setting the @code{TERM} variable to 4983a different value in these cases: 4984 4985@table @code 4986@item xterm 4987@code{xterm} is in most cases built with support for 16 colors. It can also 4988be built with support for 88 or 256 colors (but not both). You can try to 4989set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or 4990@code{xterm-256color}. 4991 4992@item rxvt 4993@code{rxvt} is often built with support for 16 colors. You can try to set 4994@code{TERM} to @code{rxvt-16color}. 4995 4996@item konsole 4997@code{konsole} too is often built with support for 16 colors. You can try to 4998set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}. 4999@end table 5000 5001After setting @code{TERM}, you can verify it by invoking 5002@samp{msgcat --color=test} and seeing whether the output looks like a 5003reasonable color map. 5004 5005@node The --style option 5006@subsection The @code{--style} option 5007 5008@opindex --style@r{, @code{msgcat} option} 5009The @samp{--style=@var{style_file}} option specifies the style file to use 5010when colorizing. It has an effect only when the @code{--color} option is 5011effective. 5012 5013@vindex PO_STYLE@r{, environment variable} 5014If the @code{--style} option is not specified, the environment variable 5015@code{PO_STYLE} is considered. It is meant to point to the user's 5016preferred style for PO files. 5017 5018The default style file is @file{$prefix/share/gettext/styles/po-default.css}, 5019where @code{$prefix} is the installation location. 5020 5021A few style files are predefined: 5022@table @file 5023@item po-vim.css 5024This style imitates the look used by vim 7. 5025 5026@item po-emacs-x.css 5027This style imitates the look used by GNU Emacs 21 and 22 in an X11 window. 5028 5029@item po-emacs-xterm.css 5030@itemx po-emacs-xterm16.css 5031@itemx po-emacs-xterm256.css 5032This style imitates the look used by GNU Emacs 22 in a terminal of type 5033@samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or 5034@samp{xterm-256color} (256 colors), respectively. 5035@end table 5036 5037@noindent 5038You can use these styles without specifying a directory. They are actually 5039located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the 5040installation location. 5041 5042You can also design your own styles. This is described in the next section. 5043 5044 5045@node Style rules 5046@subsection Style rules for PO files 5047 5048The same style file can be used for styling of a PO file, for terminal 5049output and for HTML output. It is written in CSS (Cascading Style Sheet) 5050syntax. See @url{https://www.w3.org/TR/css2/cover.html} for a formal 5051definition of CSS. Many HTML authoring tutorials also contain explanations 5052of CSS. 5053 5054In the case of HTML output, the style file is embedded in the HTML output. 5055In the case of text output, the style file is interpreted by the 5056@code{msgcat} program. This means, in particular, that when 5057@code{@@import} is used with relative file names, the file names are 5058 5059@itemize @minus 5060@item 5061relative to the resulting HTML file, in the case of HTML output, 5062 5063@item 5064relative to the style sheet containing the @code{@@import}, in the case of 5065text output. (Actually, @code{@@import}s are not yet supported in this case, 5066due to a limitation in @code{libcroco}.) 5067@end itemize 5068 5069CSS rules are built up from selectors and declarations. The declarations 5070specify graphical properties; the selectors specify when they apply. 5071 5072In PO files, the following simple selectors (based on "CSS classes", see 5073the CSS2 spec, section 5.8.3) are supported. 5074 5075@itemize @bullet 5076@item 5077Selectors that apply to entire messages: 5078 5079@table @code 5080@item .header 5081This matches the header entry of a PO file. 5082 5083@item .translated 5084This matches a translated message. 5085 5086@item .untranslated 5087This matches an untranslated message (i.e.@: a message with empty translation). 5088 5089@item .fuzzy 5090This matches a fuzzy message (i.e.@: a message which has a translation that 5091needs review by the translator). 5092 5093@item .obsolete 5094This matches an obsolete message (i.e.@: a message that was translated but is 5095not needed by the current POT file any more). 5096@end table 5097 5098@item 5099Selectors that apply to parts of a message in PO syntax. Recall the general 5100structure of a message in PO syntax: 5101 5102@example 5103@var{white-space} 5104# @var{translator-comments} 5105#. @var{extracted-comments} 5106#: @var{reference}@dots{} 5107#, @var{flag}@dots{} 5108#| msgid @var{previous-untranslated-string} 5109msgid @var{untranslated-string} 5110msgstr @var{translated-string} 5111@end example 5112 5113@table @code 5114@item .comment 5115This matches all comments (translator comments, extracted comments, 5116source file reference comments, flag comments, previous message comments, 5117as well as the entire obsolete messages). 5118 5119@item .translator-comment 5120This matches the translator comments. 5121 5122@item .extracted-comment 5123This matches the extracted comments, i.e.@: the comments placed by the 5124programmer at the attention of the translator. 5125 5126@item .reference-comment 5127This matches the source file reference comments (entire lines). 5128 5129@item .reference 5130This matches the individual source file references inside the source file 5131reference comment lines. 5132 5133@item .flag-comment 5134This matches the flag comment lines (entire lines). 5135 5136@item .flag 5137This matches the individual flags inside flag comment lines. 5138 5139@item .fuzzy-flag 5140This matches the `fuzzy' flag inside flag comment lines. 5141 5142@item .previous-comment 5143This matches the comments containing the previous untranslated string (entire 5144lines). 5145 5146@item .previous 5147This matches the previous untranslated string including the string delimiters, 5148the associated keywords (@code{msgid} etc.) and the spaces between them. 5149 5150@item .msgid 5151This matches the untranslated string including the string delimiters, 5152the associated keywords (@code{msgid} etc.) and the spaces between them. 5153 5154@item .msgstr 5155This matches the translated string including the string delimiters, 5156the associated keywords (@code{msgstr} etc.) and the spaces between them. 5157 5158@item .keyword 5159This matches the keywords (@code{msgid}, @code{msgstr}, etc.). 5160 5161@item .string 5162This matches strings, including the string delimiters (double quotes). 5163@end table 5164 5165@item 5166Selectors that apply to parts of strings: 5167 5168@table @code 5169@item .text 5170This matches the entire contents of a string (excluding the string delimiters, 5171i.e.@: the double quotes). 5172 5173@item .escape-sequence 5174This matches an escape sequence (starting with a backslash). 5175 5176@item .format-directive 5177This matches a format string directive (starting with a @samp{%} sign in the 5178case of most programming languages, with a @samp{@{} in the case of 5179@code{java-format} and @code{csharp-format}, with a @samp{~} in the case of 5180@code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of 5181@code{sh-format}). 5182 5183@item .invalid-format-directive 5184This matches an invalid format string directive. 5185 5186@item .added 5187In an untranslated string, this matches a part of the string that was not 5188present in the previous untranslated string. (Not yet implemented in this 5189release.) 5190 5191@item .changed 5192In an untranslated string or in a previous untranslated string, this matches 5193a part of the string that is changed or replaced. (Not yet implemented in 5194this release.) 5195 5196@item .removed 5197In a previous untranslated string, this matches a part of the string that 5198is not present in the current untranslated string. (Not yet implemented in 5199this release.) 5200@end table 5201@end itemize 5202 5203These selectors can be combined to hierarchical selectors. For example, 5204 5205@smallexample 5206.msgstr .invalid-format-directive @{ color: red; @} 5207@end smallexample 5208 5209@noindent 5210will highlight the invalid format directives in the translated strings. 5211 5212In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements 5213(CSS2 spec, section 5.12) are not supported. 5214 5215The declarations in HTML mode are not limited; any graphical attribute 5216supported by the browsers can be used. 5217 5218The declarations in text mode are limited to the following properties. Other 5219properties will be silently ignored. 5220 5221@table @asis 5222@item @code{color} (CSS2 spec, section 14.1) 5223@itemx @code{background-color} (CSS2 spec, section 14.2.1) 5224These properties is supported. Colors will be adjusted to match the terminal's 5225capabilities. Note that many terminals support only 8 colors. 5226 5227@item @code{font-weight} (CSS2 spec, section 15.2.3) 5228This property is supported, but most terminals can only render two different 5229weights: @code{normal} and @code{bold}. Values >= 600 are rendered as 5230@code{bold}. 5231 5232@item @code{font-style} (CSS2 spec, section 15.2.3) 5233This property is supported. The values @code{italic} and @code{oblique} are 5234rendered the same way. 5235 5236@item @code{text-decoration} (CSS2 spec, section 16.3.1) 5237This property is supported, limited to the values @code{none} and 5238@code{underline}. 5239@end table 5240 5241@node Customizing less 5242@subsection Customizing @code{less} for viewing PO files 5243 5244The @samp{less} program is a popular text file browser for use in a text 5245screen or terminal emulator. It also supports text with embedded escape 5246sequences for colors and text decorations. 5247 5248You can use @code{less} to view a PO file like this (assuming an UTF-8 5249environment): 5250 5251@smallexample 5252msgcat --to-code=UTF-8 --color xyz.po | less -R 5253@end smallexample 5254 5255You can simplify this to this simple command: 5256 5257@smallexample 5258less xyz.po 5259@end smallexample 5260 5261@noindent 5262after these three preparations: 5263 5264@enumerate 5265@item 5266Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment 5267variable. In sh shells: 5268@smallexample 5269$ LESS="$LESS -R -f" 5270$ export LESS 5271@end smallexample 5272 5273@item 5274If your system does not already have the @file{lessopen.sh} and 5275@file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and 5276@code{LESSCLOSE} environment variables, as indicated in the manual page 5277(@samp{man less}). 5278 5279@item 5280Add to @file{lessopen.sh} a piece of script that recognizes PO files 5281through their file extension and invokes @code{msgcat} on them, producing 5282a temporary file. Like this: 5283 5284@smallexample 5285case "$1" in 5286 *.po) 5287 tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"` 5288 msgcat --to-code=UTF-8 --color "$1" > "$tmpfile" 5289 echo "$tmpfile" 5290 exit 0 5291 ;; 5292esac 5293@end smallexample 5294@end enumerate 5295 5296@node Other tools 5297@section Other tools for manipulating PO files 5298 5299@cindex Pology 5300The ``Pology'' package is a Free Software package for manipulating PO files. 5301It features, in particular: 5302 5303@itemize 5304@item 5305Examination and in-place modification of collections of PO files. 5306@item 5307Format-aware diffing and patching of PO files. 5308@item 5309Handling of version-control branches. 5310@item 5311Fine-grained asynchronous review workflow. 5312@item 5313Custom translation validation. 5314@item 5315Language and project specific support. 5316@end itemize 5317 5318Its home page is at @url{http://pology.nedohodnik.net/}. 5319 5320@node libgettextpo 5321@section Writing your own programs that process PO files 5322 5323For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc. 5324is not sufficient, a set of C functions is provided in a library, to make it 5325possible to process PO files in your own programs. When you use this library, 5326you don't need to write routines to parse the PO file; instead, you retrieve 5327a pointer in memory to each of messages contained in the PO file. Functions 5328for writing those memory structures to a file after working with them are 5329provided too. 5330 5331The functions are declared in the header file @samp{<gettext-po.h>}, and are 5332defined in a library called @samp{libgettextpo}. 5333 5334@menu 5335* Error Handling:: Error handling functions 5336* po_file_t API:: File management 5337* po_message_iterator_t API:: Message iteration 5338* po_message_t API:: The basic units of the file 5339* PO Header Entry API:: Meta information of the file 5340* po_filepos_t API:: References to the sources 5341* Format Type API:: Supported format types 5342* Checking API:: Enforcing constraints 5343@end menu 5344 5345The following example shows code how these functions can be used. Error 5346handling code is omitted, as its implementation is delegated to the user 5347provided functions. 5348 5349@example 5350struct po_xerror_handler handler = 5351 @{ 5352 .xerror = @dots{}, 5353 .xerror2 = @dots{} 5354 @}; 5355const char *filename = @dots{}; 5356/* Read the file into memory. */ 5357po_file_t file = po_file_read (filename, &handler); 5358 5359@{ 5360 const char * const *domains = po_file_domains (file); 5361 const char * const *domainp; 5362 5363 /* Iterate the domains contained in the file. */ 5364 for (domainp = domains; *domainp; domainp++) 5365 @{ 5366 po_message_t *message; 5367 const char *domain = *domainp; 5368 po_message_iterator_t iterator = po_message_iterator (file, domain); 5369 5370 /* Iterate each message inside the domain. */ 5371 while ((message = po_next_message (iterator)) != NULL) 5372 @{ 5373 /* Read data from the message @dots{} */ 5374 const char *msgid = po_message_msgid (message); 5375 const char *msgstr = po_message_msgstr (message); 5376 5377 @dots{} 5378 5379 /* Modify its contents @dots{} */ 5380 if (perform_some_tests (msgid, msgstr)) 5381 po_message_set_fuzzy (message, 1); 5382 5383 @dots{} 5384 @} 5385 /* Always release returned po_message_iterator_t. */ 5386 po_message_iterator_free (iterator); 5387 @} 5388 5389 /* Write back the result. */ 5390 po_file_t result = po_file_write (file, filename, &handler); 5391@} 5392 5393/* Always release the returned po_file_t. */ 5394po_file_free (file); 5395@end example 5396 5397@node Error Handling 5398@subsection Error Handling 5399 5400Error management is performed through callbacks provided by the user of 5401the library. They are provided through a parameter with the following 5402type: 5403 5404@deftp {Data Type} struct po_xerror_handler 5405Its pointer is defined as @code{po_xerror_handler_t}. Contains 5406two fields, @code{xerror} and @code{xerror2}, with the following function 5407signatures. 5408@end deftp 5409 5410@deftypefun void xerror (int@tie{}@var{severity}, po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{filename}, size_t@tie{}@var{lineno}, size_t@tie{}@var{column}, int@tie{}@var{multiline_p}, const@tie{}char@tie{}*@var{message_text}) 5411 5412This function is called to signal a problem of the given @var{severity}. 5413It @emph{must not return} if @var{severity} is 5414@code{PO_SEVERITY_FATAL_ERROR}. 5415 5416@var{message_text} is the problem description. When @var{multiline_p} 5417is true, it can contain multiple lines of text, each terminated with a 5418newline, otherwise a single line. 5419 5420@var{message} and/or @var{filename} and @var{lineno} indicate where the 5421problem occurred: 5422 5423@itemize @bullet 5424@item 5425If @var{filename} is @code{NULL}, @var{filename} and @var{lineno} and 5426@var{column} should be ignored. 5427 5428@item 5429If @var{lineno} is @code{(size_t)(-1)}, @var{lineno} and @var{column} 5430should be ignored. 5431 5432@item 5433If @var{column} is @code{(size_t)(-1)}, it should be ignored. 5434@end itemize 5435@end deftypefun 5436 5437@deftypefun void xerror2 (int@tie{}@var{severity}, po_message_t@tie{}@var{message1}, const@tie{}char@tie{}*@var{filename1}, size_t@tie{}@var{lineno1}, size_t@tie{}@var{column1}, int@tie{}@var{multiline_p1}, const@tie{}char@tie{}*@var{message_text1}, po_message_t@tie{}@var{message2}, const@tie{}char@tie{}*@var{filename2}, size_t@tie{}@var{lineno2}, size_t@tie{}@var{column2}, int@tie{}@var{multiline_p2}, const@tie{}char@tie{}*@var{message_text2}) 5438 5439This function is called to signal a problem of the given @var{severity} 5440that refers to two messages. It @emph{must not return} if 5441@var{severity} is @code{PO_SEVERITY_FATAL_ERROR}. 5442 5443It is similar to two calls to xerror. If possible, an ellipsis can be 5444appended to @var{message_text1} and prepended to @var{message_text2}. 5445@end deftypefun 5446 5447@node po_file_t API 5448@subsection po_file_t API 5449 5450@deftp {Data Type} po_file_t 5451This is a pointer type that refers to the contents of a PO file, after it has 5452been read into memory. 5453@end deftp 5454 5455@deftypefun po_file_t po_file_create () 5456The @code{po_file_create} function creates an empty PO file representation in 5457memory. 5458@end deftypefun 5459 5460@deftypefun po_file_t po_file_read (const@tie{}char@tie{}*@var{filename}, struct@tie{}po_xerror_handler@tie{}*@var{handler}) 5461The @code{po_file_read} function reads a PO file into memory. The file name 5462is given as argument. The return value is a handle to the PO file's contents, 5463valid until @code{po_file_free} is called on it. In case of error, the 5464functions from @var{handler} are called to signal it. 5465 5466This function is exported as @samp{po_file_read_v3} at ABI level, but is 5467defined as @code{po_file_read} in C code after the inclusion of 5468@samp{<gettext-po.h>}. 5469@end deftypefun 5470 5471@deftypefun po_file_t po_file_write (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{filename}, struct@tie{}po_xerror_handler@tie{}*@var{handler}) 5472The @code{po_file_write} function writes the contents of the memory 5473structure @var{file} the @var{filename} given. The return value is 5474@var{file} after a successful operation. In case of error, the 5475functions from @var{handler} are called to signal it. 5476 5477This function is exported as @samp{po_file_write_v2} at ABI level, but 5478is defined as @code{po_file_write} in C code after the inclusion of 5479@samp{<gettext-po.h>}. 5480@end deftypefun 5481 5482@deftypefun void po_file_free (po_file_t@tie{}@var{file}) 5483The @code{po_file_free} function frees a PO file's contents from memory, 5484including all messages that are only implicitly accessible through iterators. 5485@end deftypefun 5486 5487@deftypefun {const char * const *} po_file_domains (po_file_t@tie{}@var{file}) 5488The @code{po_file_domains} function returns the domains for which the given 5489PO file has messages. The return value is a @code{NULL} terminated array 5490which is valid as long as the @var{file} handle is valid. For PO files which 5491contain no @samp{domain} directive, the return value contains only one domain, 5492namely the default domain @code{"messages"}. 5493@end deftypefun 5494 5495@node po_message_iterator_t API 5496@subsection po_message_iterator_t API 5497 5498@deftp {Data Type} po_message_iterator_t 5499This is a pointer type that refers to an iterator that produces a sequence of 5500messages. 5501@end deftp 5502 5503@deftypefun po_message_iterator_t po_message_iterator (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{domain}) 5504The @code{po_message_iterator} returns an iterator that will produce the 5505messages of @var{file} that belong to the given @var{domain}. If @var{domain} 5506is @code{NULL}, the default domain is used instead. To list the messages, 5507use the function @code{po_next_message} repeatedly. 5508@end deftypefun 5509 5510@deftypefun void po_message_iterator_free (po_message_iterator_t@tie{}@var{iterator}) 5511The @code{po_message_iterator_free} function frees an iterator previously 5512allocated through the @code{po_message_iterator} function. 5513@end deftypefun 5514 5515@deftypefun po_message_t po_next_message (po_message_iterator_t@tie{}@var{iterator}) 5516The @code{po_next_message} function returns the next message from 5517@var{iterator} and advances the iterator. It returns @code{NULL} when the 5518iterator has reached the end of its message list. 5519@end deftypefun 5520 5521@node po_message_t API 5522@subsection po_message_t API 5523 5524@deftp {Data Type} po_message_t 5525This is a pointer type that refers to a message of a PO file, including its 5526translation. 5527@end deftp 5528 5529@deftypefun {po_message_t} po_message_create (void) 5530Returns a freshly constructed message. To finish initializing the 5531message, you must set the @code{msgid} and @code{msgstr}. It @emph{must} be 5532inserted into a file to manage its memory, as there is no 5533@code{po_message_free} available to the user of the library. 5534@end deftypefun 5535 5536The following functions access details of a @code{po_message_t}. Recall 5537that the results are valid as long as the @var{file} handle is valid. 5538 5539@deftypefun {const char *} po_message_msgctxt (po_message_t@tie{}@var{message}) 5540The @code{po_message_msgctxt} function returns the @code{msgctxt}, the 5541context of @var{message}. Returns @code{NULL} for a message not restricted 5542to a context. 5543@end deftypefun 5544 5545@deftypefun {void} po_message_set_msgctxt (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgctxt}) 5546The @code{po_message_set_msgctxt} function changes the @code{msgctxt}, 5547the context of the message, to the value provided through 5548@var{msgctxt}. The value @code{NULL} removes the restriction. 5549@end deftypefun 5550 5551@deftypefun {const char *} po_message_msgid (po_message_t@tie{}@var{message}) 5552The @code{po_message_msgid} function returns the @code{msgid} (untranslated 5553English string) of @var{message}. This is guaranteed to be non-@code{NULL}. 5554@end deftypefun 5555 5556@deftypefun {void} po_message_set_msgid (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgid}) 5557The @code{po_message_set_msgid} function changes the @code{msgid} 5558(untranslated English string) of @var{message} to the value provided through 5559@var{msgid}, a non-@code{NULL} string. 5560@end deftypefun 5561 5562@deftypefun {const char *} po_message_msgid_plural (po_message_t@tie{}@var{message}) 5563The @code{po_message_msgid_plural} function returns the @code{msgid_plural} 5564(untranslated English plural string) of @var{message}, a message with plurals, 5565or @code{NULL} for a message without plural. 5566@end deftypefun 5567 5568@deftypefun {void} po_message_set_msgid_plural (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgid_plural}) 5569The @code{po_message_set_msgid_plural} function changes the 5570@code{msgid_plural} (untranslated English plural string) of a message to 5571the value provided through @var{msgid_plural}, or removes the plurals if 5572@code{NULL} is provided as @var{msgid_plural}. 5573@end deftypefun 5574 5575@deftypefun {const char *} po_message_msgstr (po_message_t@tie{}@var{message}) 5576The @code{po_message_msgstr} function returns the @code{msgstr} (translation) 5577of @var{message}. For an untranslated message, the return value is an empty 5578string. 5579@end deftypefun 5580 5581@deftypefun {void} po_message_set_msgstr (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{msgstr}) 5582The @code{po_message_set_msgstr} function changes the @code{msgstr} 5583(translation) of @var{message} to the value provided through @var{msgstr}, a 5584non-@code{NULL} string. 5585@end deftypefun 5586 5587@deftypefun {const char *} po_message_msgstr_plural (po_message_t@tie{}@var{message}, int@tie{}@var{index}) 5588The @code{po_message_msgstr_plural} function returns the 5589@code{msgstr[@var{index}]} of @var{message}, a message with plurals, or 5590@code{NULL} when the @var{index} is out of range or for a message without 5591plural. 5592@end deftypefun 5593 5594@deftypefun {void} po_message_set_msgstr_plural (po_message_t@tie{}@var{message}, int@tie{}@var{index}, const@tie{}char@tie{}*@var{msgstr_plural}) 5595The @code{po_message_set_msgstr_plural} function changes the 5596@code{msgstr[@var{index}]} of @var{message}, a message with plurals, to 5597the value provided through @var{msgstr_plural}. @var{message} must be a 5598message with plurals. 5599Use @code{NULL} as the value of @var{msgstr_plural} with 5600@var{index} pointing to the last element to reduce the number of plural 5601forms. 5602@end deftypefun 5603 5604@deftypefun {const char *} po_message_comments (po_message_t@tie{}@var{message}) 5605The @code{po_message_comments} function returns the comments of @var{message}, 5606a multiline string, ending in a newline, or a non-@code{NULL} empty string. 5607@end deftypefun 5608 5609@deftypefun {void} po_message_set_comments (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{comments}) 5610The @code{po_message_set_comments} function changes the comments of 5611@var{message} to the value @var{comments}, a multiline string, ending in a 5612newline, or a non-@code{NULL} empty string. 5613@end deftypefun 5614 5615@deftypefun {const char *} po_message_extracted_comments (po_message_t@tie{}@var{message}) 5616The @code{po_message_extracted_comments} function returns the extracted 5617comments of @var{message}, a multiline string, ending in a newline, or a 5618non-@code{NULL} empty string. 5619@end deftypefun 5620 5621@deftypefun {void} po_message_set_extracted_comments (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{extracted_comments}) 5622The @code{po_message_set_extracted_comments} function changes the 5623comments of @var{message} to the value @var{extracted_comments}, a multiline 5624string, ending in a newline, or a non-@code{NULL} empty string. 5625@end deftypefun 5626 5627@deftypefun {const char *} po_message_prev_msgctxt (po_message_t@tie{}@var{message}) 5628The @code{po_message_prev_msgctxt} function returns the previous 5629@code{msgctxt}, the previous context of @var{message}. Return 5630@code{NULL} for a message that does not have a previous context. 5631@end deftypefun 5632 5633@deftypefun {void} po_message_set_prev_msgctxt (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgctxt}) 5634The @code{po_message_set_prev_msgctxt} function changes the previous 5635@code{msgctxt}, the context of the message, to the value provided 5636through @var{prev_msgctxt}. The value @code{NULL} removes the stored 5637previous msgctxt. 5638@end deftypefun 5639 5640@deftypefun {const char *} po_message_prev_msgid (po_message_t@tie{}@var{message}) 5641The @code{po_message_prev_msgid} function returns the previous 5642@code{msgid} (untranslated English string) of @var{message}, or 5643@code{NULL} if there is no previous @code{msgid} stored. 5644@end deftypefun 5645 5646@deftypefun {void} po_message_set_prev_msgid (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgid}) 5647The @code{po_message_set_prev_msgid} function changes the previous 5648@code{msgid} (untranslated English string) of @var{message} to the value 5649provided through @var{prev_msgid}, or removes the message when it is 5650@code{NULL}. 5651@end deftypefun 5652 5653@deftypefun {const char *} po_message_prev_msgid_plural (po_message_t@tie{}@var{message}) 5654The @code{po_message_prev_msgid_plural} function returns the previous 5655@code{msgid_plural} (untranslated English plural string) of 5656@var{message}, a message with plurals, or @code{NULL} for a message 5657without plural without any stored previous @code{msgid_plural}. 5658@end deftypefun 5659 5660@deftypefun {void} po_message_set_prev_msgid_plural (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{prev_msgid_plural}) 5661The @code{po_message_set_prev_msgid_plural} function changes the 5662previous @code{msgid_plural} (untranslated English plural string) of a 5663message to the value provided through @var{prev_msgid_plural}, or 5664removes the stored previous @code{msgid_plural} if @code{NULL} is 5665provided as @var{prev_msgid_plural}. 5666@end deftypefun 5667 5668@deftypefun {int} po_message_is_obsolete (po_message_t@tie{}@var{message}) 5669The @code{po_message_is_obsolete} function returns true when @var{message} 5670is marked as obsolete. 5671@end deftypefun 5672 5673@deftypefun {void} po_message_set_obsolete (po_message_t@tie{}@var{message}, int@tie{}@var{obsolete}) 5674The @code{po_message_set_obsolete} function changes the obsolete mark of 5675@var{message}. 5676@end deftypefun 5677 5678@deftypefun {int} po_message_is_fuzzy (po_message_t@tie{}@var{message}) 5679The @code{po_message_is_fuzzy} function returns true when @var{message} 5680is marked as fuzzy. 5681@end deftypefun 5682 5683@deftypefun {void} po_message_set_fuzzy (po_message_t@tie{}@var{message}, int@tie{}@var{fuzzy}) 5684The @code{po_message_set_fuzzy} function changes the fuzzy mark of 5685@var{message}. 5686@end deftypefun 5687 5688@deftypefun {int} po_message_is_format (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{format_type}) 5689The @code{po_message_is_format} function returns true when the message 5690is marked as being a format string of @var{format_type}. 5691@end deftypefun 5692 5693@deftypefun {void} po_message_set_format (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{format_type}, int@tie{}@var{value}) 5694The @code{po_message_set_fuzzy} function changes the format mark of 5695the message for the @var{format_type} provided. 5696@end deftypefun 5697 5698@deftypefun {int} po_message_is_range (po_message_t@tie{}@var{message}, int@tie{}*@var{minp}, int@tie{}*@var{maxp}) 5699The @code{po_message_is_range} function returns true when the message 5700has a numeric range set, and stores the minimum and maximum value in the 5701locations pointed by @var{minp} and @var{maxp} respectively. 5702@end deftypefun 5703 5704@deftypefun {void} po_message_set_range (po_message_t@tie{}@var{message}, int@tie{}@var{min}, int@tie{}@var{max}) 5705The @code{po_message_set_range} function changes the numeric range of 5706the message. @var{min} and @var{max} must be non-negative, with 5707@var{min} < @var{max}. Use @var{min} and @var{max} with value @code{-1} 5708to remove the numeric range of @var{message}. 5709@end deftypefun 5710 5711@node PO Header Entry API 5712@subsection PO Header Entry API 5713 5714The following functions provide an interface to extract and manipulate 5715the header entry (@pxref{Header Entry}) from a file loaded in memory. 5716The meta information must be written back into the domain message with 5717the empty string as @code{msgid}. 5718 5719@deftypefun {const char *} po_file_domain_header (po_file_t@tie{}@var{file}, const@tie{}char@tie{}*@var{domain}) 5720Returns the header entry of a domain from @var{file}, a PO file loaded in 5721memory. The value @code{NULL} provided as @var{domain} denotes the 5722default domain. Returns @code{NULL} if there is no header entry. 5723@end deftypefun 5724 5725@deftypefun {char *} po_header_field (const@tie{}char@tie{}*@var{header}, const@tie{}char@tie{}*@var{field}) 5726Returns the value of @var{field} in the @var{header} entry. The return 5727value is either a freshly allocated string, to be freed by the caller, 5728or @code{NULL}. 5729@end deftypefun 5730 5731@deftypefun {char *} po_header_set_field (const@tie{}char@tie{}*@var{header}, const@tie{}char@tie{}*@var{field}, const@tie{}char@tie{}*@var{value}) 5732Returns a freshly allocated string which contains the entry from 5733@var{header} with @var{field} set to @var{value}. The field is added if 5734necessary. 5735@end deftypefun 5736 5737@node po_filepos_t API 5738@subsection po_filepos_t API 5739 5740@deftp {Data Type} po_filepos_t 5741This is a pointer type that refers to a string's position within a 5742source file. 5743@end deftp 5744 5745The following functions provide an interface to extract and manipulate 5746these references. 5747 5748@deftypefun {po_filepos_t} po_message_filepos (po_message_t@tie{}@var{message}, int@tie{}@var{index}) 5749Returns the file reference in position @var{index} from the message. If 5750@var{index} is out of range, returns @code{NULL}. 5751@end deftypefun 5752 5753@deftypefun {void} po_message_remove_filepos (po_message_t@tie{}@var{message}, int@tie{}@var{index}) 5754Removes the file reference in position @var{index} from the message. It 5755moves all references following @var{index} one position backwards. 5756@end deftypefun 5757 5758@deftypefun {void} po_message_add_filepos (po_message_t@tie{}@var{message}, const@tie{}char@tie{}*@var{file}, size_t@tie{}@var{start_line}) 5759Adds a reference to the string from @var{file} starting at 5760@var{start_line}, if it is not already present for the message. The 5761value @code{(size_t)(-1)} for @var{start_line} denotes that the line 5762number is not available. 5763@end deftypefun 5764 5765@node Format Type API 5766@subsection Format Type API 5767 5768@deftypefun {const char * const *} po_format_list (void) 5769Returns a @code{NULL} terminated array of the supported format types. 5770@end deftypefun 5771 5772@deftypefun {const char *} po_format_pretty_name (const@tie{}char@tie{}*@var{format_type}) 5773Returns the pretty name associated with @var{format_type}. For example, 5774it returns ``C#'' when @var{format_type} is ``csharp_format''. 5775Return @code{NULL} if @var{format_type} is not a supported format type. 5776@end deftypefun 5777 5778@node Checking API 5779@subsection Checking API 5780 5781@deftypefun {void} po_file_check_all (po_file_t@tie{}@var{file}, po_xerror_handler_t@tie{}@var{handler}) 5782Tests whether the entire @var{file} is valid, like @code{msgfmt} does it. If it 5783is invalid, passes the reasons to @var{handler}. 5784@end deftypefun 5785 5786@deftypefun {void} po_message_check_all (po_message_t@tie{}@var{message}, po_message_iterator_t@tie{}@var{iterator}, po_xerror_handler_t@tie{}@var{handler}) 5787Tests @var{message}, to be inserted at @var{iterator} in a PO file in memory, 5788like @code{msgfmt} does it. If it is invalid, passes the reasons to 5789@var{handler}. @var{iterator} is not modified by this call; it only 5790specifies the file and the domain. 5791@end deftypefun 5792 5793@deftypefun {void} po_message_check_format (po_message_t@tie{}@var{message}, po_xerror_handler_t@tie{}@var{handler}) 5794Tests whether the message translation from @var{message} is a valid 5795format string if the message is marked as being a format string. If it 5796is invalid, passes the reasons to @var{handler}. 5797 5798This function is exported as @samp{po_message_check_format_v2} at ABI 5799level, but is defined as @code{po_message_check_format} in C code after 5800the inclusion of @samp{<gettext-po.h>}. 5801@end deftypefun 5802 5803@node Binaries 5804@chapter Producing Binary MO Files 5805 5806@c FIXME: Rewrite. 5807 5808@menu 5809* msgfmt Invocation:: Invoking the @code{msgfmt} Program 5810* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 5811* MO Files:: The Format of GNU MO Files 5812@end menu 5813 5814@node msgfmt Invocation 5815@section Invoking the @code{msgfmt} Program 5816 5817@include msgfmt.texi 5818 5819@node msgunfmt Invocation 5820@section Invoking the @code{msgunfmt} Program 5821 5822@include msgunfmt.texi 5823 5824@node MO Files 5825@section The Format of GNU MO Files 5826@cindex MO file's format 5827@cindex file format, @file{.mo} 5828 5829The format of the generated MO files is best described by a picture, 5830which appears below. 5831 5832@cindex magic signature of MO files 5833The first two words serve the identification of the file. The magic 5834number will always signal GNU MO files. The number is stored in the 5835byte order used when the MO file was generated, so the magic number 5836really is two numbers: @code{0x950412de} and @code{0xde120495}. 5837 5838The second word describes the current revision of the file format, 5839composed of a major and a minor revision number. The revision numbers 5840ensure that the readers of MO files can distinguish new formats from 5841old ones and handle their contents, as far as possible. For now the 5842major revision is 0 or 1, and the minor revision is also 0 or 1. More 5843revisions might be added in the future. A program seeing an unexpected 5844major revision number should stop reading the MO file entirely; whereas 5845an unexpected minor revision number means that the file can be read but 5846will not reveal its full contents, when parsed by a program that 5847supports only smaller minor revision numbers. 5848 5849The version is kept 5850separate from the magic number, instead of using different magic 5851numbers for different formats, mainly because @file{/etc/magic} is 5852not updated often. 5853 5854Follow a number of pointers to later tables in the file, allowing 5855for the extension of the prefix part of MO files without having to 5856recompile programs reading them. This might become useful for later 5857inserting a few flag bits, indication about the charset used, new 5858tables, or other things. 5859 5860Then, at offset @var{O} and offset @var{T} in the picture, two tables 5861of string descriptors can be found. In both tables, each string 5862descriptor uses two 32 bits integers, one for the string length, 5863another for the offset of the string in the MO file, counting in bytes 5864from the start of the file. The first table contains descriptors 5865for the original strings, and is sorted so the original strings 5866are in increasing lexicographical order. The second table contains 5867descriptors for the translated strings, and is parallel to the first 5868table: to find the corresponding translation one has to access the 5869array slot in the second array with the same index. 5870 5871Having the original strings sorted enables the use of simple binary 5872search, for when the MO file does not contain an hashing table, or 5873for when it is not practical to use the hashing table provided in 5874the MO file. This also has another advantage, as the empty string 5875in a PO file GNU @code{gettext} is usually @emph{translated} into 5876some system information attached to that particular MO file, and the 5877empty string necessarily becomes the first in both the original and 5878translated tables, making the system information very easy to find. 5879 5880@cindex hash table, inside MO files 5881The size @var{S} of the hash table can be zero. In this case, the 5882hash table itself is not contained in the MO file. Some people might 5883prefer this because a precomputed hashing table takes disk space, and 5884does not win @emph{that} much speed. The hash table contains indices 5885to the sorted array of strings in the MO file. Conflict resolution is 5886done by double hashing. The precise hashing algorithm used is fairly 5887dependent on GNU @code{gettext} code, and is not documented here. 5888 5889As for the strings themselves, they follow the hash file, and each 5890is terminated with a @key{NUL}, and this @key{NUL} is not counted in 5891the length which appears in the string descriptor. The @code{msgfmt} 5892program has an option selecting the alignment for MO file strings. 5893With this option, each string is separately aligned so it starts at 5894an offset which is a multiple of the alignment value. On some RISC 5895machines, a correct alignment will speed things up. 5896 5897@cindex context, in MO files 5898Contexts are stored by storing the concatenation of the context, a 5899@key{EOT} byte, and the original string, instead of the original string. 5900 5901@cindex plural forms, in MO files 5902Plural forms are stored by letting the plural of the original string 5903follow the singular of the original string, separated through a 5904@key{NUL} byte. The length which appears in the string descriptor 5905includes both. However, only the singular of the original string 5906takes part in the hash table lookup. The plural variants of the 5907translation are all stored consecutively, separated through a 5908@key{NUL} byte. Here also, the length in the string descriptor 5909includes all of them. 5910 5911Nothing prevents a MO file from having embedded @key{NUL}s in strings. 5912However, the program interface currently used already presumes 5913that strings are @key{NUL} terminated, so embedded @key{NUL}s are 5914somewhat useless. But the MO file format is general enough so other 5915interfaces would be later possible, if for example, we ever want to 5916implement wide characters right in MO files, where @key{NUL} bytes may 5917accidentally appear. (No, we don't want to have wide characters in MO 5918files. They would make the file unnecessarily large, and the 5919@samp{wchar_t} type being platform dependent, MO files would be 5920platform dependent as well.) 5921 5922This particular issue has been strongly debated in the GNU 5923@code{gettext} development forum, and it is expectable that MO file 5924format will evolve or change over time. It is even possible that many 5925formats may later be supported concurrently. But surely, we have to 5926start somewhere, and the MO file format described here is a good start. 5927Nothing is cast in concrete, and the format may later evolve fairly 5928easily, so we should feel comfortable with the current approach. 5929 5930@example 5931@group 5932 byte 5933 +------------------------------------------+ 5934 0 | magic number = 0x950412de | 5935 | | 5936 4 | file format revision = 0 | 5937 | | 5938 8 | number of strings | == N 5939 | | 5940 12 | offset of table with original strings | == O 5941 | | 5942 16 | offset of table with translation strings | == T 5943 | | 5944 20 | size of hashing table | == S 5945 | | 5946 24 | offset of hashing table | == H 5947 | | 5948 . . 5949 . (possibly more entries later) . 5950 . . 5951 | | 5952 O | length & offset 0th string ----------------. 5953 O + 8 | length & offset 1st string ------------------. 5954 ... ... | | 5955O + ((N-1)*8)| length & offset (N-1)th string | | | 5956 | | | | 5957 T | length & offset 0th translation ---------------. 5958 T + 8 | length & offset 1st translation -----------------. 5959 ... ... | | | | 5960T + ((N-1)*8)| length & offset (N-1)th translation | | | | | 5961 | | | | | | 5962 H | start hash table | | | | | 5963 ... ... | | | | 5964 H + S * 4 | end hash table | | | | | 5965 | | | | | | 5966 | NUL terminated 0th string <----------------' | | | 5967 | | | | | 5968 | NUL terminated 1st string <------------------' | | 5969 | | | | 5970 ... ... | | 5971 | | | | 5972 | NUL terminated 0th translation <---------------' | 5973 | | | 5974 | NUL terminated 1st translation <-----------------' 5975 | | 5976 ... ... 5977 | | 5978 +------------------------------------------+ 5979@end group 5980@end example 5981 5982@node Programmers 5983@chapter The Programmer's View 5984 5985@c FIXME: Reorganize whole chapter. 5986 5987One aim of the current message catalog implementation provided by 5988GNU @code{gettext} was to use the system's message catalog handling, if the 5989installer wishes to do so. So we perhaps should first take a look at 5990the solutions we know about. The people in the POSIX committee did not 5991manage to agree on one of the semi-official standards which we'll 5992describe below. In fact they couldn't agree on anything, so they decided 5993only to include an example of an interface. The major Unix vendors 5994are split in the usage of the two most important specifications: X/Open's 5995catgets vs. Uniforum's gettext interface. We'll describe them both and 5996later explain our solution of this dilemma. 5997 5998@menu 5999* catgets:: About @code{catgets} 6000* gettext:: About @code{gettext} 6001* Comparison:: Comparing the two interfaces 6002* Using libintl.a:: Using libintl.a in own programs 6003* gettext grok:: Being a @code{gettext} grok 6004* Temp Programmers:: Temporary Notes for the Programmers Chapter 6005@end menu 6006 6007@node catgets 6008@section About @code{catgets} 6009@cindex @code{catgets}, X/Open specification 6010 6011The @code{catgets} implementation is defined in the X/Open Portability 6012Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 6013process of creating this standard seemed to be too slow for some of 6014the Unix vendors so they created their implementations on preliminary 6015versions of the standard. Of course this leads again to problems while 6016writing platform independent programs: even the usage of @code{catgets} 6017does not guarantee a unique interface. 6018 6019Another, personal comment on this that only a bunch of committee members 6020could have made this interface. They never really tried to program 6021using this interface. It is a fast, memory-saving implementation, an 6022user can happily live with it. But programmers hate it (at least I and 6023some others do@dots{}) 6024 6025But we must not forget one point: after all the trouble with transferring 6026the rights on Unix they at last came to X/Open, the very same who 6027published this specification. This leads me to making the prediction 6028that this interface will be in future Unix standards (e.g.@: Spec1170) and 6029therefore part of all Unix implementation (implementations, which are 6030@emph{allowed} to wear this name). 6031 6032@menu 6033* Interface to catgets:: The interface 6034* Problems with catgets:: Problems with the @code{catgets} interface?! 6035@end menu 6036 6037@node Interface to catgets 6038@subsection The Interface 6039@cindex interface to @code{catgets} 6040 6041The interface to the @code{catgets} implementation consists of three 6042functions which correspond to those used in file access: @code{catopen} 6043to open the catalog for using, @code{catgets} for accessing the message 6044tables, and @code{catclose} for closing after work is done. Prototypes 6045for the functions and the needed definitions are in the 6046@code{<nl_types.h>} header file. 6047 6048@cindex @code{catopen}, a @code{catgets} function 6049@code{catopen} is used like in this: 6050 6051@example 6052nl_catd catd = catopen ("catalog_name", 0); 6053@end example 6054 6055The function takes as the argument the name of the catalog. This usual 6056refers to the name of the program or the package. The second parameter 6057is not further specified in the standard. I don't even know whether it 6058is implemented consistently among various systems. So the common advice 6059is to use @code{0} as the value. The return value is a handle to the 6060message catalog, equivalent to handles to file returned by @code{open}. 6061 6062@cindex @code{catgets}, a @code{catgets} function 6063This handle is of course used in the @code{catgets} function which can 6064be used like this: 6065 6066@example 6067char *translation = catgets (catd, set_no, msg_id, "original string"); 6068@end example 6069 6070The first parameter is this catalog descriptor. The second parameter 6071specifies the set of messages in this catalog, in which the message 6072described by @code{msg_id} is obtained. @code{catgets} therefore uses a 6073three-stage addressing: 6074 6075@display 6076catalog name @result{} set number @result{} message ID @result{} translation 6077@end display 6078 6079@c Anybody else loving Haskell??? :-) -- Uli 6080 6081The fourth argument is not used to address the translation. It is given 6082as a default value in case when one of the addressing stages fail. One 6083important thing to remember is that although the return type of catgets 6084is @code{char *} the resulting string @emph{must not} be changed. It 6085should better be @code{const char *}, but the standard is published in 60861988, one year before ANSI C. 6087 6088@noindent 6089@cindex @code{catclose}, a @code{catgets} function 6090The last of these functions is used and behaves as expected: 6091 6092@example 6093catclose (catd); 6094@end example 6095 6096After this no @code{catgets} call using the descriptor is legal anymore. 6097 6098@node Problems with catgets 6099@subsection Problems with the @code{catgets} Interface?! 6100@cindex problems with @code{catgets} interface 6101 6102Now that this description seemed to be really easy --- where are the 6103problems we speak of? In fact the interface could be used in a 6104reasonable way, but constructing the message catalogs is a pain. The 6105reason for this lies in the third argument of @code{catgets}: the unique 6106message ID. This has to be a numeric value for all messages in a single 6107set. Perhaps you could imagine the problems keeping such a list while 6108changing the source code. Add a new message here, remove one there. Of 6109course there have been developed a lot of tools helping to organize this 6110chaos but one as the other fails in one aspect or the other. We don't 6111want to say that the other approach has no problems but they are far 6112more easy to manage. 6113 6114@node gettext 6115@section About @code{gettext} 6116@cindex @code{gettext}, a programmer's view 6117 6118The definition of the @code{gettext} interface comes from a Uniforum 6119proposal. It was submitted there by Sun, who had implemented the 6120@code{gettext} function in SunOS 4, around 1990. Nowadays, the 6121@code{gettext} interface is specified by the OpenI18N standard. 6122 6123The main point about this solution is that it does not follow the 6124method of normal file handling (open-use-close) and that it does not 6125burden the programmer with so many tasks, especially the unique key handling. 6126Of course here also a unique key is needed, but this key is the message 6127itself (how long or short it is). See @ref{Comparison} for a more 6128detailed comparison of the two methods. 6129 6130The following section contains a rather detailed description of the 6131interface. We make it that detailed because this is the interface 6132we chose for the GNU @code{gettext} Library. Programmers interested 6133in using this library will be interested in this description. 6134 6135@menu 6136* Interface to gettext:: The interface 6137* Ambiguities:: Solving ambiguities 6138* Locating Catalogs:: Locating message catalog files 6139* Charset conversion:: How to request conversion to Unicode 6140* Contexts:: Solving ambiguities in GUI programs 6141* Plural forms:: Additional functions for handling plurals 6142* Optimized gettext:: Optimization of the *gettext functions 6143@end menu 6144 6145@node Interface to gettext 6146@subsection The Interface 6147@cindex @code{gettext} interface 6148 6149The minimal functionality an interface must have is a) to select a 6150domain the strings are coming from (a single domain for all programs is 6151not reasonable because its construction and maintenance is difficult, 6152perhaps impossible) and b) to access a string in a selected domain. 6153 6154This is principally the description of the @code{gettext} interface. It 6155has a global domain which unqualified usages reference. Of course this 6156domain is selectable by the user. 6157 6158@example 6159char *textdomain (const char *domain_name); 6160@end example 6161 6162This provides the possibility to change or query the current status of 6163the current global domain of the @code{LC_MESSAGE} category. The 6164argument is a null-terminated string, whose characters must be legal in 6165the use in filenames. If the @var{domain_name} argument is @code{NULL}, 6166the function returns the current value. If no value has been set 6167before, the name of the default domain is returned: @emph{messages}. 6168Please note that although the return value of @code{textdomain} is of 6169type @code{char *} no changing is allowed. It is also important to know 6170that no checks of the availability are made. If the name is not 6171available you will see this by the fact that no translations are provided. 6172 6173@noindent 6174To use a domain set by @code{textdomain} the function 6175 6176@example 6177char *gettext (const char *msgid); 6178@end example 6179 6180@noindent 6181is to be used. This is the simplest reasonable form one can imagine. 6182The translation of the string @var{msgid} is returned if it is available 6183in the current domain. If it is not available, the argument itself is 6184returned. If the argument is @code{NULL} the result is undefined. 6185 6186One thing which should come into mind is that no explicit dependency to 6187the used domain is given. The current value of the domain is used. 6188If this changes between two 6189executions of the same @code{gettext} call in the program, both calls 6190reference a different message catalog. 6191 6192For the easiest case, which is normally used in internationalized 6193packages, once at the beginning of execution a call to @code{textdomain} 6194is issued, setting the domain to a unique name, normally the package 6195name. In the following code all strings which have to be translated are 6196filtered through the gettext function. That's all, the package speaks 6197your language. 6198 6199@node Ambiguities 6200@subsection Solving Ambiguities 6201@cindex several domains 6202@cindex domain ambiguities 6203@cindex large package 6204 6205While this single name domain works well for most applications there 6206might be the need to get translations from more than one domain. Of 6207course one could switch between different domains with calls to 6208@code{textdomain}, but this is really not convenient nor is it fast. A 6209possible situation could be one case subject to discussion during this 6210writing: all 6211error messages of functions in the set of common used functions should 6212go into a separate domain @code{error}. By this mean we would only need 6213to translate them once. 6214Another case are messages from a library, as these @emph{have} to be 6215independent of the current domain set by the application. 6216 6217@noindent 6218For this reasons there are two more functions to retrieve strings: 6219 6220@example 6221char *dgettext (const char *domain_name, const char *msgid); 6222char *dcgettext (const char *domain_name, const char *msgid, 6223 int category); 6224@end example 6225 6226Both take an additional argument at the first place, which corresponds 6227to the argument of @code{textdomain}. The third argument of 6228@code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}. 6229But I really don't know where this can be useful. If the 6230@var{domain_name} is @code{NULL} or @var{category} has an value beside 6231the known ones, the result is undefined. It should also be noted that 6232this function is not part of the second known implementation of this 6233function family, the one found in Solaris. 6234 6235A second ambiguity can arise by the fact, that perhaps more than one 6236domain has the same name. This can be solved by specifying where the 6237needed message catalog files can be found. 6238 6239@example 6240char *bindtextdomain (const char *domain_name, 6241 const char *dir_name); 6242@end example 6243 6244Calling this function binds the given domain to a file in the specified 6245directory (how this file is determined follows below). Especially a 6246file in the systems default place is not favored against the specified 6247file anymore (as it would be by solely using @code{textdomain}). A 6248@code{NULL} pointer for the @var{dir_name} parameter returns the binding 6249associated with @var{domain_name}. If @var{domain_name} itself is 6250@code{NULL} nothing happens and a @code{NULL} pointer is returned. Here 6251again as for all the other functions is true that none of the return 6252value must be changed! 6253 6254It is important to remember that relative path names for the 6255@var{dir_name} parameter can be trouble. Since the path is always 6256computed relative to the current directory different results will be 6257achieved when the program executes a @code{chdir} command. Relative 6258paths should always be avoided to avoid dependencies and 6259unreliabilities. 6260 6261@example 6262wchar_t *wbindtextdomain (const char *domain_name, 6263 const wchar_t *dir_name); 6264@end example 6265 6266This function is provided only on native Windows platforms. It is like 6267@code{bindtextdomain}, except that the @var{dir_name} parameter is a 6268wide string (in UTF-16 encoding, as usual on Windows). 6269 6270@node Locating Catalogs 6271@subsection Locating Message Catalog Files 6272@cindex message catalog files location 6273 6274Because many different languages for many different packages have to be 6275stored we need some way to add these information to file message catalog 6276files. The way usually used in Unix environments is have this encoding 6277in the file name. This is also done here. The directory name given in 6278@code{bindtextdomain}s second argument (or the default directory), 6279followed by the name of the locale, the locale category, and the domain name 6280are concatenated: 6281 6282@example 6283@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo 6284@end example 6285 6286The default value for @var{dir_name} is system specific. For the GNU 6287library, and for packages adhering to its conventions, it's: 6288@example 6289/usr/local/share/locale 6290@end example 6291 6292@noindent 6293@var{locale} is the name of the locale category which is designated by 6294@code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this 6295@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some 6296system, e.g.@: mingw, don't have @code{LC_MESSAGES}. Here we use a more or 6297less arbitrary value for it, namely 1729, the smallest positive integer 6298which can be represented in two different ways as the sum of two cubes.} 6299The name of the locale category is determined through 6300@code{setlocale (LC_@var{category}, NULL)}. 6301@footnote{When the system does not support @code{setlocale} its behavior 6302in setting the locale values is simulated by looking at the environment 6303variables.} 6304When using the function @code{dcgettext}, you can specify the locale category 6305through the third argument. 6306 6307@node Charset conversion 6308@subsection How to specify the output character set @code{gettext} uses 6309@cindex charset conversion at runtime 6310@cindex encoding conversion at runtime 6311 6312@code{gettext} not only looks up a translation in a message catalog. It 6313also converts the translation on the fly to the desired output character 6314set. This is useful if the user is working in a different character set 6315than the translator who created the message catalog, because it avoids 6316distributing variants of message catalogs which differ only in the 6317character set. 6318 6319The output character set is, by default, the value of @code{nl_langinfo 6320(CODESET)}, which depends on the @code{LC_CTYPE} part of the current 6321locale. But programs which store strings in a locale independent way 6322(e.g.@: UTF-8) can request that @code{gettext} and related functions 6323return the translations in that encoding, by use of the 6324@code{bind_textdomain_codeset} function. 6325 6326Note that the @var{msgid} argument to @code{gettext} is not subject to 6327character set conversion. Also, when @code{gettext} does not find a 6328translation for @var{msgid}, it returns @var{msgid} unchanged -- 6329independently of the current output character set. It is therefore 6330recommended that all @var{msgid}s be US-ASCII strings. 6331 6332@deftypefun {char *} bind_textdomain_codeset (const@tie{}char@tie{}*@var{domainname}, const@tie{}char@tie{}*@var{codeset}) 6333The @code{bind_textdomain_codeset} function can be used to specify the 6334output character set for message catalogs for domain @var{domainname}. 6335The @var{codeset} argument must be a valid codeset name which can be used 6336for the @code{iconv_open} function, or a null pointer. 6337 6338If the @var{codeset} parameter is the null pointer, 6339@code{bind_textdomain_codeset} returns the currently selected codeset 6340for the domain with the name @var{domainname}. It returns @code{NULL} if 6341no codeset has yet been selected. 6342 6343The @code{bind_textdomain_codeset} function can be used several times. 6344If used multiple times with the same @var{domainname} argument, the 6345later call overrides the settings made by the earlier one. 6346 6347The @code{bind_textdomain_codeset} function returns a pointer to a 6348string containing the name of the selected codeset. The string is 6349allocated internally in the function and must not be changed by the 6350user. If the system went out of core during the execution of 6351@code{bind_textdomain_codeset}, the return value is @code{NULL} and the 6352global variable @var{errno} is set accordingly. 6353@end deftypefun 6354 6355@node Contexts 6356@subsection Using contexts for solving ambiguities 6357@cindex context 6358@cindex GUI programs 6359@cindex translating menu entries 6360@cindex menu entries 6361 6362One place where the @code{gettext} functions, if used normally, have big 6363problems is within programs with graphical user interfaces (GUIs). The 6364problem is that many of the strings which have to be translated are very 6365short. They have to appear in pull-down menus which restricts the 6366length. But strings which are not containing entire sentences or at 6367least large fragments of a sentence may appear in more than one 6368situation in the program but might have different translations. This is 6369especially true for the one-word strings which are frequently used in 6370GUI programs. 6371 6372As a consequence many people say that the @code{gettext} approach is 6373wrong and instead @code{catgets} should be used which indeed does not 6374have this problem. But there is a very simple and powerful method to 6375handle this kind of problems with the @code{gettext} functions. 6376 6377Contexts can be added to strings to be translated. A context dependent 6378translation lookup is when a translation for a given string is searched, 6379that is limited to a given context. The translation for the same string 6380in a different context can be different. The different translations of 6381the same string in different contexts can be stored in the in the same 6382MO file, and can be edited by the translator in the same PO file. 6383 6384The @file{gettext.h} include file contains the lookup macros for strings 6385with contexts. They are implemented as thin macros and inline functions 6386over the functions from @code{<libintl.h>}. 6387 6388@findex pgettext 6389@example 6390const char *pgettext (const char *msgctxt, const char *msgid); 6391@end example 6392 6393In a call of this macro, @var{msgctxt} and @var{msgid} must be string 6394literals. The macro returns the translation of @var{msgid}, restricted 6395to the context given by @var{msgctxt}. 6396 6397The @var{msgctxt} string is visible in the PO file to the translator. 6398You should try to make it somehow canonical and never changing. Because 6399every time you change an @var{msgctxt}, the translator will have to review 6400the translation of @var{msgid}. 6401 6402Finding a canonical @var{msgctxt} string that doesn't change over time can 6403be hard. But you shouldn't use the file name or class name containing the 6404@code{pgettext} call -- because it is a common development task to rename 6405a file or a class, and it shouldn't cause translator work. Also you shouldn't 6406use a comment in the form of a complete English sentence as @var{msgctxt} -- 6407because orthography or grammar changes are often applied to such sentences, 6408and again, it shouldn't force the translator to do a review. 6409 6410The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext} 6411fetches a particular translation of the @var{msgid}. 6412 6413@findex dpgettext 6414@findex dcpgettext 6415@example 6416const char *dpgettext (const char *domain_name, 6417 const char *msgctxt, const char *msgid); 6418const char *dcpgettext (const char *domain_name, 6419 const char *msgctxt, const char *msgid, 6420 int category); 6421@end example 6422 6423These are generalizations of @code{pgettext}. They behave similarly to 6424@code{dgettext} and @code{dcgettext}, respectively. The @var{domain_name} 6425argument defines the translation domain. The @var{category} argument 6426allows to use another locale category than @code{LC_MESSAGES}. 6427 6428As as example consider the following fictional situation. A GUI program 6429has a menu bar with the following entries: 6430 6431@smallexample 6432+------------+------------+--------------------------------------+ 6433| File | Printer | | 6434+------------+------------+--------------------------------------+ 6435| Open | | Select | 6436| New | | Open | 6437+----------+ | Connect | 6438 +----------+ 6439@end smallexample 6440 6441To have the strings @code{File}, @code{Printer}, @code{Open}, 6442@code{New}, @code{Select}, and @code{Connect} translated there has to be 6443at some point in the code a call to a function of the @code{gettext} 6444family. But in two places the string passed into the function would be 6445@code{Open}. The translations might not be the same and therefore we 6446are in the dilemma described above. 6447 6448What distinguishes the two places is the menu path from the menu root to 6449the particular menu entries: 6450 6451@smallexample 6452Menu|File 6453Menu|Printer 6454Menu|File|Open 6455Menu|File|New 6456Menu|Printer|Select 6457Menu|Printer|Open 6458Menu|Printer|Connect 6459@end smallexample 6460 6461The context is thus the menu path without its last part. So, the calls 6462look like this: 6463 6464@smallexample 6465pgettext ("Menu|", "File") 6466pgettext ("Menu|", "Printer") 6467pgettext ("Menu|File|", "Open") 6468pgettext ("Menu|File|", "New") 6469pgettext ("Menu|Printer|", "Select") 6470pgettext ("Menu|Printer|", "Open") 6471pgettext ("Menu|Printer|", "Connect") 6472@end smallexample 6473 6474Whether or not to use the @samp{|} character at the end of the context is a 6475matter of style. 6476 6477For more complex cases, where the @var{msgctxt} or @var{msgid} are not 6478string literals, more general macros are available: 6479 6480@findex pgettext_expr 6481@findex dpgettext_expr 6482@findex dcpgettext_expr 6483@example 6484const char *pgettext_expr (const char *msgctxt, const char *msgid); 6485const char *dpgettext_expr (const char *domain_name, 6486 const char *msgctxt, const char *msgid); 6487const char *dcpgettext_expr (const char *domain_name, 6488 const char *msgctxt, const char *msgid, 6489 int category); 6490@end example 6491 6492Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions. 6493These macros are more general. But in the case that both argument expressions 6494are string literals, the macros without the @samp{_expr} suffix are more 6495efficient. 6496 6497@node Plural forms 6498@subsection Additional functions for plural forms 6499@cindex plural forms 6500 6501The functions of the @code{gettext} family described so far (and all the 6502@code{catgets} functions as well) have one problem in the real world 6503which have been neglected completely in all existing approaches. What 6504is meant here is the handling of plural forms. 6505 6506Looking through Unix source code before the time anybody thought about 6507internationalization (and, sadly, even afterwards) one can often find 6508code similar to the following: 6509 6510@smallexample 6511 printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 6512@end smallexample 6513 6514@noindent 6515After the first complaints from people internationalizing the code people 6516either completely avoided formulations like this or used strings like 6517@code{"file(s)"}. Both look unnatural and should be avoided. First 6518tries to solve the problem correctly looked like this: 6519 6520@smallexample 6521 if (n == 1) 6522 printf ("%d file deleted", n); 6523 else 6524 printf ("%d files deleted", n); 6525@end smallexample 6526 6527But this does not solve the problem. It helps languages where the 6528plural form of a noun is not simply constructed by adding an 6529@ifhtml 6530‘s’ 6531@end ifhtml 6532@ifnothtml 6533`s' 6534@end ifnothtml 6535but that is all. Once again people fell into the trap of believing the 6536rules their language is using are universal. But the handling of plural 6537forms differs widely between the language families. For example, 6538Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports: 6539 6540@quotation 6541In Polish we use e.g.@: plik (file) this way: 6542@example 65431 plik 65442,3,4 pliki 65455-21 pliko'w 654622-24 pliki 654725-31 pliko'w 6548@end example 6549and so on (o' means 8859-2 oacute which should be rather okreska, 6550similar to aogonek). 6551@end quotation 6552 6553There are two things which can differ between languages (and even inside 6554language families); 6555 6556@itemize @bullet 6557@item 6558The form how plural forms are built differs. This is a problem with 6559languages which have many irregularities. German, for instance, is a 6560drastic case. Though English and German are part of the same language 6561family (Germanic), the almost regular forming of plural noun forms 6562(appending an 6563@ifhtml 6564‘s’) 6565@end ifhtml 6566@ifnothtml 6567`s') 6568@end ifnothtml 6569is hardly found in German. 6570 6571@item 6572The number of plural forms differ. This is somewhat surprising for 6573those who only have experiences with Romanic and Germanic languages 6574since here the number is the same (there are two). 6575 6576But other language families have only one form or many forms. More 6577information on this in an extra section. 6578@end itemize 6579 6580The consequence of this is that application writers should not try to 6581solve the problem in their code. This would be localization since it is 6582only usable for certain, hardcoded language environments. Instead the 6583extended @code{gettext} interface should be used. 6584 6585These extra functions are taking instead of the one key string two 6586strings and a numerical argument. The idea behind this is that using 6587the numerical argument and the first string as a key, the implementation 6588can select using rules specified by the translator the right plural 6589form. The two string arguments then will be used to provide a return 6590value in case no message catalog is found (similar to the normal 6591@code{gettext} behavior). In this case the rules for Germanic language 6592is used and it is assumed that the first string argument is the singular 6593form, the second the plural form. 6594 6595This has the consequence that programs without language catalogs can 6596display the correct strings only if the program itself is written using 6597a Germanic language. This is a limitation but since the GNU C library 6598(as well as the GNU @code{gettext} package) are written as part of the 6599GNU package and the coding standards for the GNU project require program 6600being written in English, this solution nevertheless fulfills its 6601purpose. 6602 6603@deftypefun {char *} ngettext (const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n}) 6604The @code{ngettext} function is similar to the @code{gettext} function 6605as it finds the message catalogs in the same way. But it takes two 6606extra arguments. The @var{msgid1} parameter must contain the singular 6607form of the string to be converted. It is also used as the key for the 6608search in the catalog. The @var{msgid2} parameter is the plural form. 6609The parameter @var{n} is used to determine the plural form. If no 6610message catalog is found @var{msgid1} is returned if @code{n == 1}, 6611otherwise @code{msgid2}. 6612 6613An example for the use of this function is: 6614 6615@smallexample 6616printf (ngettext ("%d file removed", "%d files removed", n), n); 6617@end smallexample 6618 6619Please note that the numeric value @var{n} has to be passed to the 6620@code{printf} function as well. It is not sufficient to pass it only to 6621@code{ngettext}. 6622 6623In the English singular case, the number -- always 1 -- can be replaced with 6624"one": 6625 6626@smallexample 6627printf (ngettext ("One file removed", "%d files removed", n), n); 6628@end smallexample 6629 6630@noindent 6631This works because the @samp{printf} function discards excess arguments that 6632are not consumed by the format string. 6633 6634If this function is meant to yield a format string that takes two or more 6635arguments, you can not use it like this: 6636 6637@smallexample 6638printf (ngettext ("%d file removed from directory %s", 6639 "%d files removed from directory %s", 6640 n), 6641 n, dir); 6642@end smallexample 6643 6644@noindent 6645because in many languages the translators want to replace the @samp{%d} 6646with an explicit word in the singular case, just like ``one'' in English, 6647and C format strings cannot consume the second argument but skip the first 6648argument. Instead, you have to reorder the arguments so that @samp{n} 6649comes last: 6650 6651@smallexample 6652printf (ngettext ("%2$d file removed from directory %1$s", 6653 "%2$d files removed from directory %1$s", 6654 n), 6655 dir, n); 6656@end smallexample 6657 6658@noindent 6659See @ref{c-format} for details about this argument reordering syntax. 6660 6661When you know that the value of @code{n} is within a given range, you can 6662specify it as a comment directed to the @code{xgettext} tool. This 6663information may help translators to use more adequate translations. Like 6664this: 6665 6666@smallexample 6667if (days > 7 && days < 14) 6668 /* xgettext: range: 1..6 */ 6669 printf (ngettext ("one week and one day", "one week and %d days", 6670 days - 7), 6671 days - 7); 6672@end smallexample 6673 6674It is also possible to use this function when the strings don't contain a 6675cardinal number: 6676 6677@smallexample 6678puts (ngettext ("Delete the selected file?", 6679 "Delete the selected files?", 6680 n)); 6681@end smallexample 6682 6683In this case the number @var{n} is only used to choose the plural form. 6684@end deftypefun 6685 6686@deftypefun {char *} dngettext (const@tie{}char@tie{}*@var{domain}, const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n}) 6687The @code{dngettext} is similar to the @code{dgettext} function in the 6688way the message catalog is selected. The difference is that it takes 6689two extra parameter to provide the correct plural form. These two 6690parameters are handled in the same way @code{ngettext} handles them. 6691@end deftypefun 6692 6693@deftypefun {char *} dcngettext (const@tie{}char@tie{}*@var{domain}, const@tie{}char@tie{}*@var{msgid1}, const@tie{}char@tie{}*@var{msgid2}, unsigned@tie{}long@tie{}int@tie{}@var{n}, int@tie{}@var{category}) 6694The @code{dcngettext} is similar to the @code{dcgettext} function in the 6695way the message catalog is selected. The difference is that it takes 6696two extra parameter to provide the correct plural form. These two 6697parameters are handled in the same way @code{ngettext} handles them. 6698@end deftypefun 6699 6700Now, how do these functions solve the problem of the plural forms? 6701Without the input of linguists (which was not available) it was not 6702possible to determine whether there are only a few different forms in 6703which plural forms are formed or whether the number can increase with 6704every new supported language. 6705 6706Therefore the solution implemented is to allow the translator to specify 6707the rules of how to select the plural form. Since the formula varies 6708with every language this is the only viable solution except for 6709hardcoding the information in the code (which still would require the 6710possibility of extensions to not prevent the use of new languages). 6711 6712@cindex specifying plural form in a PO file 6713@kwindex nplurals@r{, in a PO file header} 6714@kwindex plural@r{, in a PO file header} 6715The information about the plural form selection has to be stored in the 6716header entry of the PO file (the one with the empty @code{msgid} string). 6717The plural form information looks like this: 6718 6719@smallexample 6720Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 6721@end smallexample 6722 6723The @code{nplurals} value must be a decimal number which specifies how 6724many different plural forms exist for this language. The string 6725following @code{plural} is an expression which is using the C language 6726syntax. Exceptions are that no negative numbers are allowed, numbers 6727must be decimal, and the only variable allowed is @code{n}. Spaces are 6728allowed in the expression, but backslash-newlines are not; in the 6729examples below the backslash-newlines are present for formatting purposes 6730only. This expression will be evaluated whenever one of the functions 6731@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The 6732numeric value passed to these functions is then substituted for all uses 6733of the variable @code{n} in the expression. The resulting value then 6734must be greater or equal to zero and smaller than the value given as the 6735value of @code{nplurals}. 6736 6737@noindent 6738@cindex plural form formulas 6739The following rules are known at this point. The language with families 6740are listed. But this does not necessarily mean the information can be 6741generalized for the whole family (as can be easily seen in the table 6742below).@footnote{Additions are welcome. Send appropriate information to 6743@email{bug-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}. 6744The Unicode CLDR Project (@uref{http://cldr.unicode.org}) provides a 6745comprehensive set of plural forms in a different format. The 6746@code{msginit} program has preliminary support for the format so you can 6747use it as a baseline (@pxref{msginit Invocation}).} 6748 6749@table @asis 6750@item Only one form: 6751Some languages only require one single form. There is no distinction 6752between the singular and plural form. An appropriate header entry 6753would look like this: 6754 6755@smallexample 6756Plural-Forms: nplurals=1; plural=0; 6757@end smallexample 6758 6759@noindent 6760Languages with this property include: 6761 6762@table @asis 6763@item Asian family 6764Japanese, @c 122.1 million speakers 6765Vietnamese, @c 68.6 million speakers 6766Korean @c 66.3 million speakers 6767@item Tai-Kadai family 6768Thai @c 20.4 million speakers 6769@end table 6770 6771@item Two forms, singular used for one only 6772This is the form used in most existing programs since it is what English 6773is using. A header entry would look like this: 6774 6775@smallexample 6776Plural-Forms: nplurals=2; plural=n != 1; 6777@end smallexample 6778 6779(Note: this uses the feature of C expressions that boolean expressions 6780have to value zero or one.) 6781 6782@noindent 6783Languages with this property include: 6784 6785@table @asis 6786@item Germanic family 6787English, @c 328.0 million speakers 6788German, @c 96.9 million speakers 6789Dutch, @c 21.7 million speakers 6790Swedish, @c 8.3 million speakers 6791Danish, @c 5.6 million speakers 6792Norwegian, @c 4.6 million speakers 6793Faroese @c 0.05 million speakers 6794@item Romanic family 6795Spanish, @c 328.5 million speakers 6796Portuguese, @c 178.0 million speakers - 163 million Brazilian Portuguese 6797Italian @c 61.7 million speakers 6798@item Latin/Greek family 6799Greek @c 13.1 million speakers 6800@item Slavic family 6801Bulgarian @c 9.1 million speakers 6802@item Finno-Ugric family 6803Finnish, @c 5.0 million speakers 6804Estonian @c 1.0 million speakers 6805@item Semitic family 6806Hebrew @c 5.3 million speakers 6807@item Austronesian family 6808Bahasa Indonesian @c 23.2 million speakers 6809@item Artificial 6810Esperanto @c 2 million speakers 6811@end table 6812 6813@noindent 6814Other languages using the same header entry are: 6815 6816@table @asis 6817@item Finno-Ugric family 6818Hungarian @c 12.5 million speakers 6819@item Turkic/Altaic family 6820Turkish @c 50.8 million speakers 6821@end table 6822 6823Hungarian does not appear to have a plural if you look at sentences involving 6824cardinal numbers. For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is 6825``123 alma''. But when the number is not explicit, the distinction between 6826singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is 6827``az alm@'{a}k''. Since @code{ngettext} has to support both types of sentences, 6828it is classified here, under ``two forms''. 6829 6830The same holds for Turkish: ``1 apple'' is ``1 elma'', and ``123 apples'' is 6831``123 elma''. But when the number is omitted, the distinction between singular 6832and plural exists: ``the apple'' is ``elma'', and ``the apples'' is 6833``elmalar''. 6834 6835@item Two forms, singular used for zero and one 6836Exceptional case in the language family. The header entry would be: 6837 6838@smallexample 6839Plural-Forms: nplurals=2; plural=n>1; 6840@end smallexample 6841 6842@noindent 6843Languages with this property include: 6844 6845@table @asis 6846@item Romanic family 6847Brazilian Portuguese, @c 163 million speakers 6848French @c 67.8 million speakers 6849@end table 6850 6851@item Three forms, special case for zero 6852The header entry would be: 6853 6854@smallexample 6855Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 6856@end smallexample 6857 6858@noindent 6859Languages with this property include: 6860 6861@table @asis 6862@item Baltic family 6863Latvian @c 1.5 million speakers 6864@end table 6865 6866@item Three forms, special cases for one and two 6867The header entry would be: 6868 6869@smallexample 6870Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 6871@end smallexample 6872 6873@noindent 6874Languages with this property include: 6875 6876@table @asis 6877@item Celtic 6878Gaeilge (Irish) @c 0.4 million speakers 6879@end table 6880 6881@item Three forms, special case for numbers ending in 00 or [2-9][0-9] 6882The header entry would be: 6883 6884@smallexample 6885Plural-Forms: nplurals=3; \ 6886 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 6887@end smallexample 6888 6889@noindent 6890Languages with this property include: 6891 6892@table @asis 6893@item Romanic family 6894Romanian @c 23.4 million speakers 6895@end table 6896 6897@item Three forms, special case for numbers ending in 1[2-9] 6898The header entry would look like this: 6899 6900@smallexample 6901Plural-Forms: nplurals=3; \ 6902 plural=n%10==1 && n%100!=11 ? 0 : \ 6903 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 6904@end smallexample 6905 6906@noindent 6907Languages with this property include: 6908 6909@table @asis 6910@item Baltic family 6911Lithuanian @c 3.2 million speakers 6912@end table 6913 6914@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 6915The header entry would look like this: 6916 6917@smallexample 6918Plural-Forms: nplurals=3; \ 6919 plural=n%10==1 && n%100!=11 ? 0 : \ 6920 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 6921@end smallexample 6922 6923@noindent 6924Languages with this property include: 6925 6926@table @asis 6927@item Slavic family 6928Russian, @c 143.6 million speakers 6929Ukrainian, @c 37.0 million speakers 6930Belarusian, @c 8.6 million speakers 6931Serbian, @c 7.0 million speakers 6932Croatian @c 5.5 million speakers 6933@end table 6934 6935@item Three forms, special cases for 1 and 2, 3, 4 6936The header entry would look like this: 6937 6938@smallexample 6939Plural-Forms: nplurals=3; \ 6940 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 6941@end smallexample 6942 6943@noindent 6944Languages with this property include: 6945 6946@table @asis 6947@item Slavic family 6948Czech, @c 9.5 million speakers 6949Slovak @c 5.0 million speakers 6950@end table 6951 6952@item Three forms, special case for one and some numbers ending in 2, 3, or 4 6953The header entry would look like this: 6954 6955@smallexample 6956Plural-Forms: nplurals=3; \ 6957 plural=n==1 ? 0 : \ 6958 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 6959@end smallexample 6960 6961@noindent 6962Languages with this property include: 6963 6964@table @asis 6965@item Slavic family 6966Polish @c 40.0 million speakers 6967@end table 6968 6969@item Four forms, special case for one and all numbers ending in 02, 03, or 04 6970The header entry would look like this: 6971 6972@smallexample 6973Plural-Forms: nplurals=4; \ 6974 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 6975@end smallexample 6976 6977@noindent 6978Languages with this property include: 6979 6980@table @asis 6981@item Slavic family 6982Slovenian @c 1.9 million speakers 6983@end table 6984 6985@item Six forms, special cases for one, two, all numbers ending in 02, 03, @dots{} 10, all numbers ending in 11 @dots{} 99, and others 6986The header entry would look like this: 6987 6988@smallexample 6989Plural-Forms: nplurals=6; \ 6990 plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \ 6991 : n%100>=11 ? 4 : 5; 6992@end smallexample 6993 6994@noindent 6995Languages with this property include: 6996 6997@table @asis 6998@item Afroasiatic family 6999Arabic @c 246.0 million speakers 7000@end table 7001@end table 7002 7003You might now ask, @code{ngettext} handles only numbers @var{n} of type 7004@samp{unsigned long}. What about larger integer types? What about negative 7005numbers? What about floating-point numbers? 7006 7007About larger integer types, such as @samp{uintmax_t} or 7008@samp{unsigned long long}: they can be handled by reducing the value to a 7009range that fits in an @samp{unsigned long}. Simply casting the value to 7010@samp{unsigned long} would not do the right thing, since it would treat 7011@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and 7012the like. Here you can exploit the fact that all mentioned plural form 7013formulas eventually become periodic, with a period that is a divisor of 100 7014(or 1000 or 1000000). So, when you reduce a large value to another one in 7015the range [1000000, 1999999] that ends in the same 6 decimal digits, you 7016can assume that it will lead to the same plural form selection. This code 7017does this: 7018 7019@smallexample 7020#include <inttypes.h> 7021uintmax_t nbytes = ...; 7022printf (ngettext ("The file has %"PRIuMAX" byte.", 7023 "The file has %"PRIuMAX" bytes.", 7024 (nbytes > ULONG_MAX 7025 ? (nbytes % 1000000) + 1000000 7026 : nbytes)), 7027 nbytes); 7028@end smallexample 7029 7030Negative and floating-point values usually represent physical entities for 7031which singular and plural don't clearly apply. In such cases, there is no 7032need to use @code{ngettext}; a simple @code{gettext} call with a form suitable 7033for all values will do. For example: 7034 7035@smallexample 7036printf (gettext ("Time elapsed: %.3f seconds"), 7037 num_milliseconds * 0.001); 7038@end smallexample 7039 7040@noindent 7041Even if @var{num_milliseconds} happens to be a multiple of 1000, the output 7042@smallexample 7043Time elapsed: 1.000 seconds 7044@end smallexample 7045@noindent 7046is acceptable in English, and similarly for other languages. 7047 7048The translators' perspective regarding plural forms is explained in 7049@ref{Translating plural forms}. 7050 7051@node Optimized gettext 7052@subsection Optimization of the *gettext functions 7053@cindex optimization of @code{gettext} functions 7054 7055At this point of the discussion we should talk about an advantage of the 7056GNU @code{gettext} implementation. Some readers might have pointed out 7057that an internationalized program might have a poor performance if some 7058string has to be translated in an inner loop. While this is unavoidable 7059when the string varies from one run of the loop to the other it is 7060simply a waste of time when the string is always the same. Take the 7061following example: 7062 7063@example 7064@group 7065@{ 7066 while (@dots{}) 7067 @{ 7068 puts (gettext ("Hello world")); 7069 @} 7070@} 7071@end group 7072@end example 7073 7074@noindent 7075When the locale selection does not change between two runs the resulting 7076string is always the same. One way to use this is: 7077 7078@example 7079@group 7080@{ 7081 str = gettext ("Hello world"); 7082 while (@dots{}) 7083 @{ 7084 puts (str); 7085 @} 7086@} 7087@end group 7088@end example 7089 7090@noindent 7091But this solution is not usable in all situation (e.g.@: when the locale 7092selection changes) nor does it lead to legible code. 7093 7094For this reason, GNU @code{gettext} caches previous translation results. 7095When the same translation is requested twice, with no new message 7096catalogs being loaded in between, @code{gettext} will, the second time, 7097find the result through a single cache lookup. 7098 7099@node Comparison 7100@section Comparing the Two Interfaces 7101@cindex @code{gettext} vs @code{catgets} 7102@cindex comparison of interfaces 7103 7104@c FIXME: arguments to catgets vs. gettext 7105@c Partly done 950718 -- drepper 7106 7107The following discussion is perhaps a little bit colored. As said 7108above we implemented GNU @code{gettext} following the Uniforum 7109proposal and this surely has its reasons. But it should show how we 7110came to this decision. 7111 7112First we take a look at the developing process. When we write an 7113application using NLS provided by @code{gettext} we proceed as always. 7114Only when we come to a string which might be seen by the users and thus 7115has to be translated we use @code{gettext("@dots{}")} instead of 7116@code{"@dots{}"}. At the beginning of each source file (or in a central 7117header file) we define 7118 7119@example 7120#define gettext(String) (String) 7121@end example 7122 7123Even this definition can be avoided when the system supports the 7124@code{gettext} function in its C library. When we compile this code the 7125result is the same as if no NLS code is used. When you take a look at 7126the GNU @code{gettext} code you will see that we use @code{_("@dots{}")} 7127instead of @code{gettext("@dots{}")}. This reduces the number of 7128additional characters per translatable string to @emph{3} (in words: 7129three). 7130 7131When now a production version of the program is needed we simply replace 7132the definition 7133 7134@example 7135#define _(String) (String) 7136@end example 7137 7138@noindent 7139by 7140 7141@cindex include file @file{libintl.h} 7142@example 7143#include <libintl.h> 7144#define _(String) gettext (String) 7145@end example 7146 7147@noindent 7148Additionally we run the program @file{xgettext} on all source code file 7149which contain translatable strings and that's it: we have a running 7150program which does not depend on translations to be available, but which 7151can use any that becomes available. 7152 7153@cindex @code{N_}, a convenience macro 7154The same procedure can be done for the @code{gettext_noop} invocations 7155(@pxref{Special cases}). One usually defines @code{gettext_noop} as a 7156no-op macro. So you should consider the following code for your project: 7157 7158@example 7159#define gettext_noop(String) String 7160#define N_(String) gettext_noop (String) 7161@end example 7162 7163@code{N_} is a short form similar to @code{_}. The @file{Makefile} in 7164the @file{po/} directory of GNU @code{gettext} knows by default both of the 7165mentioned short forms so you are invited to follow this proposal for 7166your own ease. 7167 7168Now to @code{catgets}. The main problem is the work for the 7169programmer. Every time he comes to a translatable string he has to 7170define a number (or a symbolic constant) which has also be defined in 7171the message catalog file. He also has to take care for duplicate 7172entries, duplicate message IDs etc. If he wants to have the same 7173quality in the message catalog as the GNU @code{gettext} program 7174provides he also has to put the descriptive comments for the strings and 7175the location in all source code files in the message catalog. This is 7176nearly a Mission: Impossible. 7177 7178But there are also some points people might call advantages speaking for 7179@code{catgets}. If you have a single word in a string and this string 7180is used in different contexts it is likely that in one or the other 7181language the word has different translations. Example: 7182 7183@example 7184printf ("%s: %d", gettext ("number"), number_of_errors) 7185 7186printf ("you should see %d %s", number_count, 7187 number_count == 1 ? gettext ("number") : gettext ("numbers")) 7188@end example 7189 7190Here we have to translate two times the string @code{"number"}. Even 7191if you do not speak a language beside English it might be possible to 7192recognize that the two words have a different meaning. In German the 7193first appearance has to be translated to @code{"Anzahl"} and the second 7194to @code{"Zahl"}. 7195 7196Now you can say that this example is really esoteric. And you are 7197right! This is exactly how we felt about this problem and decide that 7198it does not weight that much. The solution for the above problem could 7199be very easy: 7200 7201@example 7202printf ("%s %d", gettext ("number:"), number_of_errors) 7203 7204printf (number_count == 1 ? gettext ("you should see %d number") 7205 : gettext ("you should see %d numbers"), 7206 number_count) 7207@end example 7208 7209We believe that we can solve all conflicts with this method. If it is 7210difficult one can also consider changing one of the conflicting string a 7211little bit. But it is not impossible to overcome. 7212 7213@code{catgets} allows same original entry to have different translations, 7214but @code{gettext} has another, scalable approach for solving ambiguities 7215of this kind: @xref{Ambiguities}. 7216 7217@node Using libintl.a 7218@section Using libintl.a in own programs 7219 7220Starting with version 0.9.4 the library @code{libintl.h} should be 7221self-contained. I.e., you can use it in your own programs without 7222providing additional functions. The @file{Makefile} will put the header 7223and the library in directories selected using the @code{$(prefix)}. 7224 7225@node gettext grok 7226@section Being a @code{gettext} grok 7227 7228@strong{ NOTE: } This documentation section is outdated and needs to be 7229revised. 7230 7231To fully exploit the functionality of the GNU @code{gettext} library it 7232is surely helpful to read the source code. But for those who don't want 7233to spend that much time in reading the (sometimes complicated) code here 7234is a list comments: 7235 7236@itemize @bullet 7237@item Changing the language at runtime 7238@cindex language selection at runtime 7239 7240For interactive programs it might be useful to offer a selection of the 7241used language at runtime. To understand how to do this one need to know 7242how the used language is determined while executing the @code{gettext} 7243function. The method which is presented here only works correctly 7244with the GNU implementation of the @code{gettext} functions. 7245 7246In the function @code{dcgettext} at every call the current setting of 7247the highest priority environment variable is determined and used. 7248Highest priority means here the following list with decreasing 7249priority: 7250 7251@enumerate 7252@vindex LANGUAGE@r{, environment variable} 7253@item @code{LANGUAGE} 7254@vindex LC_ALL@r{, environment variable} 7255@item @code{LC_ALL} 7256@vindex LC_CTYPE@r{, environment variable} 7257@vindex LC_NUMERIC@r{, environment variable} 7258@vindex LC_TIME@r{, environment variable} 7259@vindex LC_COLLATE@r{, environment variable} 7260@vindex LC_MONETARY@r{, environment variable} 7261@vindex LC_MESSAGES@r{, environment variable} 7262@item @code{LC_xxx}, according to selected locale category 7263@vindex LANG@r{, environment variable} 7264@item @code{LANG} 7265@end enumerate 7266 7267Afterwards the path is constructed using the found value and the 7268translation file is loaded if available. 7269 7270What happens now when the value for, say, @code{LANGUAGE} changes? According 7271to the process explained above the new value of this variable is found 7272as soon as the @code{dcgettext} function is called. But this also means 7273the (perhaps) different message catalog file is loaded. In other 7274words: the used language is changed. 7275 7276But there is one little hook. The code for gcc-2.7.0 and up provides 7277some optimization. This optimization normally prevents the calling of 7278the @code{dcgettext} function as long as no new catalog is loaded. But 7279if @code{dcgettext} is not called the program also cannot find the 7280@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A 7281solution for this is very easy. Include the following code in the 7282language switching function. 7283 7284@example 7285 /* Change language. */ 7286 setenv ("LANGUAGE", "fr", 1); 7287 7288 /* Make change known. */ 7289 @{ 7290 extern int _nl_msg_cat_cntr; 7291 ++_nl_msg_cat_cntr; 7292 @} 7293@end example 7294 7295@cindex @code{_nl_msg_cat_cntr} 7296The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}. 7297You don't need to know what this is for. But it can be used to detect 7298whether a @code{gettext} implementation is GNU gettext and not non-GNU 7299system's native gettext implementation. 7300 7301@end itemize 7302 7303@node Temp Programmers 7304@section Temporary Notes for the Programmers Chapter 7305 7306@strong{ NOTE: } This documentation section is outdated and needs to be 7307revised. 7308 7309@menu 7310* Temp Implementations:: Temporary - Two Possible Implementations 7311* Temp catgets:: Temporary - About @code{catgets} 7312* Temp WSI:: Temporary - Why a single implementation 7313* Temp Notes:: Temporary - Notes 7314@end menu 7315 7316@node Temp Implementations 7317@subsection Temporary - Two Possible Implementations 7318 7319There are two competing methods for language independent messages: 7320the X/Open @code{catgets} method, and the Uniforum @code{gettext} 7321method. The @code{catgets} method indexes messages by integers; the 7322@code{gettext} method indexes them by their English translations. 7323The @code{catgets} method has been around longer and is supported 7324by more vendors. The @code{gettext} method is supported by Sun, 7325and it has been heard that the COSE multi-vendor initiative is 7326supporting it. Neither method is a POSIX standard; the POSIX.1 7327committee had a lot of disagreement in this area. 7328 7329Neither one is in the POSIX standard. There was much disagreement 7330in the POSIX.1 committee about using the @code{gettext} routines 7331vs. @code{catgets} (XPG). In the end the committee couldn't 7332agree on anything, so no messaging system was included as part 7333of the standard. I believe the informative annex of the standard 7334includes the XPG3 messaging interfaces, ``@dots{}as an example of 7335a messaging system that has been implemented@dots{}'' 7336 7337They were very careful not to say anywhere that you should use one 7338set of interfaces over the other. For more on this topic please 7339see the Programming for Internationalization FAQ. 7340 7341@node Temp catgets 7342@subsection Temporary - About @code{catgets} 7343 7344There have been a few discussions of late on the use of 7345@code{catgets} as a base. I think it important to present both 7346sides of the argument and hence am opting to play devil's advocate 7347for a little bit. 7348 7349I'll not deny the fact that @code{catgets} could have been designed 7350a lot better. It currently has quite a number of limitations and 7351these have already been pointed out. 7352 7353However there is a great deal to be said for consistency and 7354standardization. A common recurring problem when writing Unix 7355software is the myriad portability problems across Unix platforms. 7356It seems as if every Unix vendor had a look at the operating system 7357and found parts they could improve upon. Undoubtedly, these 7358modifications are probably innovative and solve real problems. 7359However, software developers have a hard time keeping up with all 7360these changes across so many platforms. 7361 7362And this has prompted the Unix vendors to begin to standardize their 7363systems. Hence the impetus for Spec1170. Every major Unix vendor 7364has committed to supporting this standard and every Unix software 7365developer waits with glee the day they can write software to this 7366standard and simply recompile (without having to use autoconf) 7367across different platforms. 7368 7369As I understand it, Spec1170 is roughly based upon version 4 of the 7370X/Open Portability Guidelines (XPG4). Because @code{catgets} and 7371friends are defined in XPG4, I'm led to believe that @code{catgets} 7372is a part of Spec1170 and hence will become a standardized component 7373of all Unix systems. 7374 7375@node Temp WSI 7376@subsection Temporary - Why a single implementation 7377 7378Now it seems kind of wasteful to me to have two different systems 7379installed for accessing message catalogs. If we do want to remedy 7380@code{catgets} deficiencies why don't we try to expand @code{catgets} 7381(in a compatible manner) rather than implement an entirely new system. 7382Otherwise, we'll end up with two message catalog access systems installed 7383with an operating system - one set of routines for packages using GNU 7384@code{gettext} for their internationalization, and another set of routines 7385(catgets) for all other software. Bloated? 7386 7387Supposing another catalog access system is implemented. Which do 7388we recommend? At least for Linux, we need to attract as many 7389software developers as possible. Hence we need to make it as easy 7390for them to port their software as possible. Which means supporting 7391@code{catgets}. We will be implementing the @code{libintl} code 7392within our @code{libc}, but does this mean we also have to incorporate 7393another message catalog access scheme within our @code{libc} as well? 7394And what about people who are going to be using the @code{libintl} 7395+ non-@code{catgets} routines. When they port their software to 7396other platforms, they're now going to have to include the front-end 7397(@code{libintl}) code plus the back-end code (the non-@code{catgets} 7398access routines) with their software instead of just including the 7399@code{libintl} code with their software. 7400 7401Message catalog support is however only the tip of the iceberg. 7402What about the data for the other locale categories? They also have 7403a number of deficiencies. Are we going to abandon them as well and 7404develop another duplicate set of routines (should @code{libintl} 7405expand beyond message catalog support)? 7406 7407Like many parts of Unix that can be improved upon, we're stuck with balancing 7408compatibility with the past with useful improvements and innovations for 7409the future. 7410 7411@node Temp Notes 7412@subsection Temporary - Notes 7413 7414X/Open agreed very late on the standard form so that many 7415implementations differ from the final form. Both of my system (old 7416Linux catgets and Ultrix-4) have a strange variation. 7417 7418OK. After incorporating the last changes I have to spend some time on 7419making the GNU/Linux @code{libc} @code{gettext} functions. So in future 7420Solaris is not the only system having @code{gettext}. 7421 7422@node Translators 7423@chapter The Translator's View 7424 7425@c FIXME: Reorganize whole chapter. 7426 7427@menu 7428* Trans Intro 0:: Introduction 0 7429* Trans Intro 1:: Introduction 1 7430* Discussions:: Discussions 7431* Organization:: Organization 7432* Information Flow:: Information Flow 7433* Translating plural forms:: How to fill in @code{msgstr[0]}, @code{msgstr[1]} 7434* Prioritizing messages:: How to find which messages to translate first 7435@end menu 7436 7437@node Trans Intro 0 7438@section Introduction 0 7439 7440@strong{ NOTE: } This documentation section is outdated and needs to be 7441revised. 7442 7443Free software is going international! The Translation Project is a way 7444to get maintainers, translators and users all together, so free software 7445will gradually become able to speak many native languages. 7446 7447The GNU @code{gettext} tool set contains @emph{everything} maintainers 7448need for internationalizing their packages for messages. It also 7449contains quite useful tools for helping translators at localizing 7450messages to their native language, once a package has already been 7451internationalized. 7452 7453To achieve the Translation Project, we need many interested 7454people who like their own language and write it well, and who are also 7455able to synergize with other translators speaking the same language. 7456If you'd like to volunteer to @emph{work} at translating messages, 7457please send mail to your translating team. 7458 7459Each team has its own mailing list, courtesy of Linux 7460International. You may reach your translating team at the address 7461@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639} 7462code for your language. Language codes are @emph{not} the same as 7463country codes given in @w{ISO 3166}. The following translating teams 7464exist: 7465 7466@quotation 7467Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl}, 7468Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish 7469@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it}, 7470Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish 7471@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es}, 7472Swedish @code{sv} and Turkish @code{tr}. 7473@end quotation 7474 7475@noindent 7476For example, you may reach the Chinese translating team by writing to 7477@file{zh@@li.org}. When you become a member of the translating team 7478for your own language, you may subscribe to its list. For example, 7479Swedish people can send a message to @w{@file{sv-request@@li.org}}, 7480having this message body: 7481 7482@example 7483subscribe 7484@end example 7485 7486Keep in mind that team members should be interested in @emph{working} 7487at translations, or at solving translational difficulties, rather than 7488merely lurking around. If your team does not exist yet and you want to 7489start one, please write to @w{@file{coordinator@@translationproject.org}}; 7490you will then reach the coordinator for all translator teams. 7491 7492A handful of GNU packages have already been adapted and provided 7493with message translations for several languages. Translation 7494teams have begun to organize, using these packages as a starting 7495point. But there are many more packages and many languages for 7496which we have no volunteer translators. If you would like to 7497volunteer to work at translating messages, please send mail to 7498@file{coordinator@@translationproject.org} indicating what language(s) 7499you can work on. 7500 7501@node Trans Intro 1 7502@section Introduction 1 7503 7504@strong{ NOTE: } This documentation section is outdated and needs to be 7505revised. 7506 7507This is now official, GNU is going international! Here is the 7508announcement submitted for the January 1995 GNU Bulletin: 7509 7510@quotation 7511A handful of GNU packages have already been adapted and provided 7512with message translations for several languages. Translation 7513teams have begun to organize, using these packages as a starting 7514point. But there are many more packages and many languages 7515for which we have no volunteer translators. If you'd like to 7516volunteer to work at translating messages, please send mail to 7517@samp{coordinator@@translationproject.org} indicating what language(s) 7518you can work on. 7519@end quotation 7520 7521This document should answer many questions for those who are curious about 7522the process or would like to contribute. Please at least skim over it, 7523hoping to cut down a little of the high volume of e-mail generated by this 7524collective effort towards internationalization of free software. 7525 7526Most free programming which is widely shared is done in English, and 7527currently, English is used as the main communicating language between 7528national communities collaborating to free software. This very document 7529is written in English. This will not change in the foreseeable future. 7530 7531However, there is a strong appetite from national communities for 7532having more software able to write using national language and habits, 7533and there is an on-going effort to modify free software in such a way 7534that it becomes able to do so. The experiments driven so far raised 7535an enthusiastic response from pretesters, so we believe that 7536internationalization of free software is dedicated to succeed. 7537 7538For suggestion clarifications, additions or corrections to this 7539document, please e-mail to @file{coordinator@@translationproject.org}. 7540 7541@node Discussions 7542@section Discussions 7543 7544@strong{ NOTE: } This documentation section is outdated and needs to be 7545revised. 7546 7547Facing this internationalization effort, a few users expressed their 7548concerns. Some of these doubts are presented and discussed, here. 7549 7550@itemize @bullet 7551@item Smaller groups 7552 7553Some languages are not spoken by a very large number of people, so people 7554speaking them sometimes consider that there may not be all that much 7555demand such versions of free software packages. Moreover, many people 7556being @emph{into computers}, in some countries, generally seem to prefer 7557English versions of their software. 7558 7559On the other end, people might enjoy their own language a lot, and be 7560very motivated at providing to themselves the pleasure of having their 7561beloved free software speaking their mother tongue. They do themselves 7562a personal favor, and do not pay that much attention to the number of 7563people benefiting of their work. 7564 7565@item Misinterpretation 7566 7567Other users are shy to push forward their own language, seeing in this 7568some kind of misplaced propaganda. Someone thought there must be some 7569users of the language over the networks pestering other people with it. 7570 7571But any spoken language is worth localization, because there are 7572people behind the language for whom the language is important and 7573dear to their hearts. 7574 7575@item Odd translations 7576 7577The biggest problem is to find the right translations so that 7578everybody can understand the messages. Translations are usually a 7579little odd. Some people get used to English, to the extent they may 7580find translations into their own language ``rather pushy, obnoxious 7581and sometimes even hilarious.'' As a French speaking man, I have 7582the experience of those instruction manuals for goods, so poorly 7583translated in French in Korea or Taiwan@dots{} 7584 7585The fact is that we sometimes have to create a kind of national 7586computer culture, and this is not easy without the collaboration of 7587many people liking their mother tongue. This is why translations are 7588better achieved by people knowing and loving their own language, and 7589ready to work together at improving the results they obtain. 7590 7591@item Dependencies over the GPL or LGPL 7592 7593Some people wonder if using GNU @code{gettext} necessarily brings their 7594package under the protective wing of the GNU General Public License or 7595the GNU Lesser General Public License, when they do not want to make 7596their program free, or want other kinds of freedom. The simplest 7597answer is ``normally not''. 7598 7599The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the 7600contents of @code{libintl}, is covered by the GNU Lesser General Public 7601License. The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the 7602rest of the GNU @code{gettext} package, is covered by the GNU General 7603Public License. 7604 7605The mere marking of localizable strings in a package, or conditional 7606inclusion of a few lines for initialization, is not really including 7607GPL'ed or LGPL'ed code. However, since the localization routines in 7608@code{libintl} are under the LGPL, the LGPL needs to be considered. 7609It gives the right to distribute the complete unmodified source of 7610@code{libintl} even with non-free programs. It also gives the right 7611to use @code{libintl} as a shared library, even for non-free programs. 7612But it gives the right to use @code{libintl} as a static library or 7613to incorporate @code{libintl} into another library only to free 7614software. 7615 7616@end itemize 7617 7618@node Organization 7619@section Organization 7620 7621@strong{ NOTE: } This documentation section is outdated and needs to be 7622revised. 7623 7624On a larger scale, the true solution would be to organize some kind of 7625fairly precise set up in which volunteers could participate. I gave 7626some thought to this idea lately, and realize there will be some 7627touchy points. I thought of writing to Richard Stallman to launch 7628such a project, but feel it might be good to shake out the ideas 7629between ourselves first. Most probably that Linux International has 7630some experience in the field already, or would like to orchestrate 7631the volunteer work, maybe. Food for thought, in any case! 7632 7633I guess we have to setup something early, somehow, that will help 7634many possible contributors of the same language to interlock and avoid 7635work duplication, and further be put in contact for solving together 7636problems particular to their tongue (in most languages, there are many 7637difficulties peculiar to translating technical English). My Swedish 7638contributor acknowledged these difficulties, and I'm well aware of 7639them for French. 7640 7641This is surely not a technical issue, but we should manage so the 7642effort of locale contributors be maximally useful, despite the national 7643team layer interface between contributors and maintainers. 7644 7645The Translation Project needs some setup for coordinating language 7646coordinators. Localizing evolving programs will surely 7647become a permanent and continuous activity in the free software community, 7648once well started. 7649The setup should be minimally completed and tested before GNU 7650@code{gettext} becomes an official reality. The e-mail address 7651@file{coordinator@@translationproject.org} has been set up for receiving 7652offers from volunteers and general e-mail on these topics. This address 7653reaches the Translation Project coordinator. 7654 7655@menu 7656* Central Coordination:: Central Coordination 7657* National Teams:: National Teams 7658* Mailing Lists:: Mailing Lists 7659@end menu 7660 7661@node Central Coordination 7662@subsection Central Coordination 7663 7664I also think GNU will need sooner than it thinks, that someone set up 7665a way to organize and coordinate these groups. Some kind of group 7666of groups. My opinion is that it would be good that GNU delegates 7667this task to a small group of collaborating volunteers, shortly. 7668Perhaps in @file{gnu.announce} a list of this national committee's 7669can be published. 7670 7671My role as coordinator would simply be to refer to Ulrich any German 7672speaking volunteer interested to localization of free software packages, and 7673maybe helping national groups to initially organize, while maintaining 7674national registries for until national groups are ready to take over. 7675In fact, the coordinator should ease volunteers to get in contact with 7676one another for creating national teams, which should then select 7677one coordinator per language, or country (regionalized language). 7678If well done, the coordination should be useful without being an 7679overwhelming task, the time to put delegations in place. 7680 7681@node National Teams 7682@subsection National Teams 7683 7684I suggest we look for volunteer coordinators/editors for individual 7685languages. These people will scan contributions of translation files 7686for various programs, for their own languages, and will ensure high 7687and uniform standards of diction. 7688 7689From my current experience with other people in these days, those who 7690provide localizations are very enthusiastic about the process, and are 7691more interested in the localization process than in the program they 7692localize, and want to do many programs, not just one. This seems 7693to confirm that having a coordinator/editor for each language is a 7694good idea. 7695 7696We need to choose someone who is good at writing clear and concise 7697prose in the language in question. That is hard---we can't check 7698it ourselves. So we need to ask a few people to judge each others' 7699writing and select the one who is best. 7700 7701I announce my prerelease to a few dozen people, and you would not 7702believe all the discussions it generated already. I shudder to think 7703what will happen when this will be launched, for true, officially, 7704world wide. Who am I to arbitrate between two Czekolsovak users 7705contradicting each other, for example? 7706 7707I assume that your German is not much better than my French so that 7708I would not be able to judge about these formulations. What I would 7709suggest is that for each language there is a group for people who 7710maintain the PO files and judge about changes. I suspect there will 7711be cultural differences between how such groups of people will behave. 7712Some will have relaxed ways, reach consensus easily, and have anyone 7713of the group relate to the maintainers, while others will fight to 7714death, organize heavy administrations up to national standards, and 7715use strict channels. 7716 7717The German team is putting out a good example. Right now, they are 7718maybe half a dozen people revising translations of each other and 7719discussing the linguistic issues. I do not even have all the names. 7720Ulrich Drepper is taking care of coordinating the German team. 7721He subscribed to all my pretest lists, so I do not even have to warn 7722him specifically of incoming releases. 7723 7724I'm sure, that is a good idea to get teams for each language working 7725on translations. That will make the translations better and more 7726consistent. 7727 7728@menu 7729* Sub-Cultures:: Sub-Cultures 7730* Organizational Ideas:: Organizational Ideas 7731@end menu 7732 7733@node Sub-Cultures 7734@subsubsection Sub-Cultures 7735 7736Taking French for example, there are a few sub-cultures around computers 7737which developed diverging vocabularies. Picking volunteers here and 7738there without addressing this problem in an organized way, soon in the 7739project, might produce a distasteful mix of internationalized programs, 7740and possibly trigger endless quarrels among those who really care. 7741 7742Keeping some kind of unity in the way French localization of 7743internationalized programs is achieved is a difficult (and delicate) job. 7744Knowing the latin character of French people (:-), if we take this 7745the wrong way, we could end up nowhere, or spoil a lot of energies. 7746Maybe we should begin to address this problem seriously @emph{before} 7747GNU @code{gettext} become officially published. And I suspect that this 7748means soon! 7749 7750@node Organizational Ideas 7751@subsubsection Organizational Ideas 7752 7753I expect the next big changes after the official release. Please note 7754that I use the German translation of the short GPL message. We need 7755to set a few good examples before the localization goes out for true 7756in the free software community. Here are a few points to discuss: 7757 7758@itemize @bullet 7759@item 7760Each group should have one FTP server (at least one master). 7761 7762@item 7763The files on the server should reflect the latest version (of 7764course!) and it should also contain a RCS directory with the 7765corresponding archives (I don't have this now). 7766 7767@item 7768There should also be a ChangeLog file (this is more useful than the 7769RCS archive but can be generated automatically from the later by 7770Emacs). 7771 7772@item 7773A @dfn{core group} should judge about questionable changes (for now 7774this group consists solely by me but I ask some others occasionally; 7775this also seems to work). 7776 7777@end itemize 7778 7779@node Mailing Lists 7780@subsection Mailing Lists 7781 7782If we get any inquiries about GNU @code{gettext}, send them on to: 7783 7784@example 7785@file{coordinator@@translationproject.org} 7786@end example 7787 7788The @file{*-pretest} lists are quite useful to me, maybe the idea could 7789be generalized to many GNU, and non-GNU packages. But each maintainer 7790his/her way! 7791 7792Fran@,{c}ois, we have a mechanism in place here at 7793@file{gnu.ai.mit.edu} to track teams, support mailing lists for 7794them and log members. We have a slight preference that you use it. 7795If this is OK with you, I can get you clued in. 7796 7797Things are changing! A few years ago, when Daniel Fekete and I 7798asked for a mailing list for GNU localization, nested at the FSF, we 7799were politely invited to organize it anywhere else, and so did we. 7800For communicating with my pretesters, I later made a handful of 7801mailing lists located at iro.umontreal.ca and administrated by 7802@code{majordomo}. These lists have been @emph{very} dependable 7803so far@dots{} 7804 7805I suspect that the German team will organize itself a mailing list 7806located in Germany, and so forth for other countries. But before they 7807organize for true, it could surely be useful to offer mailing lists 7808located at the FSF to each national team. So yes, please explain me 7809how I should proceed to create and handle them. 7810 7811We should create temporary mailing lists, one per country, to help 7812people organize. Temporary, because once regrouped and structured, it 7813would be fair the volunteers from country bring back @emph{their} list 7814in there and manage it as they want. My feeling is that, in the long 7815run, each team should run its own list, from within their country. 7816There also should be some central list to which all teams could 7817subscribe as they see fit, as long as each team is represented in it. 7818 7819@node Information Flow 7820@section Information Flow 7821 7822@strong{ NOTE: } This documentation section is outdated and needs to be 7823revised. 7824 7825There will surely be some discussion about this messages after the 7826packages are finally released. If people now send you some proposals 7827for better messages, how do you proceed? Jim, please note that 7828right now, as I put forward nearly a dozen of localizable programs, I 7829receive both the translations and the coordination concerns about them. 7830 7831If I put one of my things to pretest, Ulrich receives the announcement 7832and passes it on to the German team, who make last minute revisions. 7833Then he submits the translation files to me @emph{as the maintainer}. 7834For free packages I do not maintain, I would not even hear about it. 7835This scheme could be made to work for the whole Translation Project, 7836I think. For security reasons, maybe Ulrich (national coordinators, 7837in fact) should update central registry kept at the Translation Project 7838(Jim, me, or Len's recruits) once in a while. 7839 7840In December/January, I was aggressively ready to internationalize 7841all of GNU, giving myself the duty of one small GNU package per week 7842or so, taking many weeks or months for bigger packages. But it does 7843not work this way. I first did all the things I'm responsible for. 7844I've nothing against some missionary work on other maintainers, but 7845I'm also losing a lot of energy over it---same debates over again. 7846 7847And when the first localized packages are released we'll get a lot of 7848responses about ugly translations :-). Surely, and we need to have 7849beforehand a fairly good idea about how to handle the information 7850flow between the national teams and the package maintainers. 7851 7852Please start saving somewhere a quick history of each PO file. I know 7853for sure that the file format will change, allowing for comments. 7854It would be nice that each file has a kind of log, and references for 7855those who want to submit comments or gripes, or otherwise contribute. 7856I sent a proposal for a fast and flexible format, but it is not 7857receiving acceptance yet by the GNU deciders. I'll tell you when I 7858have more information about this. 7859 7860@node Translating plural forms 7861@section Translating plural forms 7862 7863@cindex plural forms, translating 7864Suppose you are translating a PO file, and it contains an entry like this: 7865 7866@smallexample 7867#, c-format 7868msgid "One file removed" 7869msgid_plural "%d files removed" 7870msgstr[0] "" 7871msgstr[1] "" 7872@end smallexample 7873 7874@noindent 7875What does this mean? How do you fill it in? 7876 7877Such an entry denotes a message with plural forms, that is, a message where 7878the text depends on a cardinal number. The general form of the message, 7879in English, is the @code{msgid_plural} line. The @code{msgid} line is the 7880English singular form, that is, the form for when the number is equal to 1. 7881More details about plural forms are explained in @ref{Plural forms}. 7882 7883The first thing you need to look at is the @code{Plural-Forms} line in the 7884header entry of the PO file. It contains the number of plural forms and a 7885formula. If the PO file does not yet have such a line, you have to add it. 7886It only depends on the language into which you are translating. You can 7887get this info by using the @code{msginit} command (see @ref{Creating}) -- 7888it contains a database of known plural formulas -- or by asking other 7889members of your translation team. 7890 7891Suppose the line looks as follows: 7892 7893@smallexample 7894"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n" 7895"%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n" 7896@end smallexample 7897 7898It's logically one line; recall that the PO file formatting is allowed to 7899break long lines so that each physical line fits in 80 monospaced columns. 7900 7901The value of @code{nplurals} here tells you that there are three plural 7902forms. The first thing you need to do is to ensure that the entry contains 7903an @code{msgstr} line for each of the forms: 7904 7905@smallexample 7906#, c-format 7907msgid "One file removed" 7908msgid_plural "%d files removed" 7909msgstr[0] "" 7910msgstr[1] "" 7911msgstr[2] "" 7912@end smallexample 7913 7914Then translate the @code{msgid_plural} line and fill it in into each 7915@code{msgstr} line: 7916 7917@smallexample 7918#, c-format 7919msgid "One file removed" 7920msgid_plural "%d files removed" 7921msgstr[0] "%d slika uklonjenih" 7922msgstr[1] "%d slika uklonjenih" 7923msgstr[2] "%d slika uklonjenih" 7924@end smallexample 7925 7926Now you can refine the translation so that it matches the plural form. 7927According to the formula above, @code{msgstr[0]} is used when the number 7928ends in 1 but does not end in 11; @code{msgstr[1]} is used when the number 7929ends in 2, 3, 4, but not in 12, 13, 14; and @code{msgstr[2]} is used in 7930all other cases. With this knowledge, you can refine the translations: 7931 7932@smallexample 7933#, c-format 7934msgid "One file removed" 7935msgid_plural "%d files removed" 7936msgstr[0] "%d slika je uklonjena" 7937msgstr[1] "%d datoteke uklonjenih" 7938msgstr[2] "%d slika uklonjenih" 7939@end smallexample 7940 7941You noticed that in the English singular form (@code{msgid}) the number 7942placeholder could be omitted and replaced by the numeral word ``one''. 7943Can you do this in your translation as well? 7944 7945@smallexample 7946msgstr[0] "jednom datotekom je uklonjen" 7947@end smallexample 7948 7949@noindent 7950Well, it depends on whether @code{msgstr[0]} applies only to the number 1, 7951or to other numbers as well. If, according to the plural formula, 7952@code{msgstr[0]} applies only to @code{n == 1}, then you can use the 7953specialized translation without the number placeholder. In our case, 7954however, @code{msgstr[0]} also applies to the numbers 21, 31, 41, etc., 7955and therefore you cannot omit the placeholder. 7956 7957@node Prioritizing messages 7958@section Prioritizing messages: How to determine which messages to translate first 7959 7960A translator sometimes has only a limited amount of time per week to 7961spend on a package, and some packages have quite large message catalogs 7962(over 1000 messages). Therefore she wishes to translate the messages 7963first that are the most visible to the user, or that occur most frequently. 7964This section describes how to determine these "most urgent" messages. 7965It also applies to determine the "next most urgent" messages after the 7966message catalog has already been partially translated. 7967 7968In a first step, she uses the programs like a user would do. While she 7969does this, the GNU @code{gettext} library logs into a file the not yet 7970translated messages for which a translation was requested from the program. 7971 7972In a second step, she uses the PO mode to translate precisely this set 7973of messages. 7974 7975@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable} 7976Here are more details. The GNU @code{libintl} library (but not the 7977corresponding functions in GNU @code{libc}) supports an environment variable 7978@code{GETTEXT_LOG_UNTRANSLATED}. The GNU @code{libintl} library will 7979log into this file the messages for which @code{gettext()} and related 7980functions couldn't find the translation. If the file doesn't exist, it 7981will be created as needed. On systems with GNU @code{libc} a shared library 7982@samp{preloadable_libintl.so} is provided that can be used with the ELF 7983@samp{LD_PRELOAD} mechanism. 7984 7985So, in the first step, the translator uses these commands on systems with 7986GNU @code{libc}: 7987 7988@smallexample 7989$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so 7990$ export LD_PRELOAD 7991$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 7992$ export GETTEXT_LOG_UNTRANSLATED 7993@end smallexample 7994 7995@noindent 7996and these commands on other systems: 7997 7998@smallexample 7999$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 8000$ export GETTEXT_LOG_UNTRANSLATED 8001@end smallexample 8002 8003Then she uses and peruses the programs. (It is a good and recommended 8004practice to use the programs for which you provide translations: it 8005gives you the needed context.) When done, she removes the environment 8006variables: 8007 8008@smallexample 8009$ unset LD_PRELOAD 8010$ unset GETTEXT_LOG_UNTRANSLATED 8011@end smallexample 8012 8013The second step starts with removing duplicates: 8014 8015@smallexample 8016$ msguniq $HOME/gettextlogused > missing.po 8017@end smallexample 8018 8019The result is a PO file, but needs some preprocessing before a PO file editor 8020can be used with it. First, it is a multi-domain PO file, containing 8021messages from many translation domains. Second, it lacks all translator 8022comments and source references. Here is how to get a list of the affected 8023translation domains: 8024 8025@smallexample 8026$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq 8027@end smallexample 8028 8029Then the translator can handle the domains one by one. For simplicity, 8030let's use environment variables to denote the language, domain and source 8031package. 8032 8033@smallexample 8034$ lang=nl # your language 8035$ domain=coreutils # the name of the domain to be handled 8036$ package=/usr/src/gnu/coreutils-4.5.4 # the package where it comes from 8037@end smallexample 8038 8039She takes the latest copy of @file{$lang.po} from the Translation Project, 8040or from the package (in most cases, @file{$package/po/$lang.po}), or 8041creates a fresh one if she's the first translator (see @ref{Creating}). 8042She then uses the following commands to mark the not urgent messages as 8043"obsolete". (This doesn't mean that these messages - translated and 8044untranslated ones - will go away. It simply means that the PO file editor 8045will ignore them in the following editing session.) 8046 8047@smallexample 8048$ msggrep --domain=$domain missing.po | grep -v '^domain' \ 8049 > $domain-missing.po 8050$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \ 8051 > $domain.$lang-urgent.po 8052@end smallexample 8053 8054The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor 8055(@pxref{Editing}). 8056(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also 8057preserve obsolete messages, as they should.) 8058Finally she restores the not urgent messages (with their earlier 8059translations, for those which were already translated) through this command: 8060 8061@smallexample 8062$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \ 8063 > $domain.$lang.po 8064@end smallexample 8065 8066Then she can submit @file{$domain.$lang.po} and proceed to the next domain. 8067 8068@node Maintainers 8069@chapter The Maintainer's View 8070@cindex package maintainer's view of @code{gettext} 8071 8072The maintainer of a package has many responsibilities. One of them 8073is ensuring that the package will install easily on many platforms, 8074and that the magic we described earlier (@pxref{Users}) will work 8075for installers and end users. 8076 8077Of course, there are many possible ways by which GNU @code{gettext} 8078might be integrated in a distribution, and this chapter does not cover 8079them in all generality. Instead, it details one possible approach which 8080is especially adequate for many free software distributions following GNU 8081standards, or even better, Gnits standards, because GNU @code{gettext} 8082is purposely for helping the internationalization of the whole GNU 8083project, and as many other good free packages as possible. So, the 8084maintainer's view presented here presumes that the package already has 8085a @file{configure.ac} file and uses GNU Autoconf. 8086 8087Nevertheless, GNU @code{gettext} may surely be useful for free packages 8088not following GNU standards and conventions, but the maintainers of such 8089packages might have to show imagination and initiative in organizing 8090their distributions so @code{gettext} work for them in all situations. 8091There are surely many, out there. 8092 8093Even if @code{gettext} methods are now stabilizing, slight adjustments 8094might be needed between successive @code{gettext} versions, so you 8095should ideally revise this chapter in subsequent releases, looking 8096for changes. 8097 8098@menu 8099* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 8100* Prerequisites:: Prerequisite Works 8101* gettextize Invocation:: Invoking the @code{gettextize} Program 8102* Adjusting Files:: Files You Must Create or Alter 8103* autoconf macros:: Autoconf macros for use in @file{configure.ac} 8104* Version Control Issues:: 8105* Release Management:: Creating a Distribution Tarball 8106@end menu 8107 8108@node Flat and Non-Flat 8109@section Flat or Non-Flat Directory Structures 8110 8111Some free software packages are distributed as @code{tar} files which unpack 8112in a single directory, these are said to be @dfn{flat} distributions. 8113Other free software packages have a one level hierarchy of subdirectories, using 8114for example a subdirectory named @file{doc/} for the Texinfo manual and 8115man pages, another called @file{lib/} for holding functions meant to 8116replace or complement C libraries, and a subdirectory @file{src/} for 8117holding the proper sources for the package. These other distributions 8118are said to be @dfn{non-flat}. 8119 8120We cannot say much about flat distributions. A flat 8121directory structure has the disadvantage of increasing the difficulty 8122of updating to a new version of GNU @code{gettext}. Also, if you have 8123many PO files, this could somewhat pollute your single directory. 8124Also, GNU @code{gettext}'s libintl sources consist of C sources, shell 8125scripts, @code{sed} scripts and complicated Makefile rules, which don't 8126fit well into an existing flat structure. For these reasons, we 8127recommend to use non-flat approach in this case as well. 8128 8129Maybe because GNU @code{gettext} itself has a non-flat structure, 8130we have more experience with this approach, and this is what will be 8131described in the remaining of this chapter. Some maintainers might 8132use this as an opportunity to unflatten their package structure. 8133 8134@node Prerequisites 8135@section Prerequisite Works 8136@cindex converting a package to use @code{gettext} 8137@cindex migration from earlier versions of @code{gettext} 8138@cindex upgrading to new versions of @code{gettext} 8139 8140There are some works which are required for using GNU @code{gettext} 8141in one of your package. These works have some kind of generality 8142that escape the point by point descriptions used in the remainder 8143of this chapter. So, we describe them here. 8144 8145@itemize @bullet 8146@item 8147Before attempting to use @code{gettextize} you should install some 8148other packages first. 8149Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU 8150@code{gettext} are already installed at your site, and if not, proceed 8151to do this first. If you get to install these things, beware that 8152GNU @code{m4} must be fully installed before GNU Autoconf is even 8153@emph{configured}. 8154 8155To further ease the task of a package maintainer the @code{automake} 8156package was designed and implemented. GNU @code{gettext} now uses this 8157tool and the @file{Makefile} in the @file{po/} directory therefore 8158knows about all the goals necessary for using @code{automake}. 8159 8160Those four packages are only needed by you, as a maintainer; the 8161installers of your own package and end users do not really need any of 8162GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake} 8163for successfully installing and running your package, with messages 8164properly translated. But this is not completely true if you provide 8165internationalized shell scripts within your own package: GNU 8166@code{gettext} shall then be installed at the user site if the end users 8167want to see the translation of shell script messages. 8168 8169@item 8170Your package should use Autoconf and have a @file{configure.ac} or 8171@file{configure.in} file. 8172If it does not, you have to learn how. The Autoconf documentation 8173is quite well written, it is a good idea that you print it and get 8174familiar with it. 8175 8176@item 8177Your C sources should have already been modified according to 8178instructions given earlier in this manual. @xref{Sources}. 8179 8180@item 8181Your @file{po/} directory should receive all PO files submitted to you 8182by the translator teams, each having @file{@var{ll}.po} as a name. 8183This is not usually easy to get translation 8184work done before your package gets internationalized and available! 8185Since the cycle has to start somewhere, the easiest for the maintainer 8186is to start with absolutely no PO files, and wait until various 8187translator teams get interested in your package, and submit PO files. 8188 8189@end itemize 8190 8191It is worth adding here a few words about how the maintainer should 8192ideally behave with PO files submissions. As a maintainer, your role is 8193to authenticate the origin of the submission as being the representative 8194of the appropriate translating teams of the Translation Project (forward 8195the submission to @file{coordinator@@translationproject.org} in case of doubt), 8196to ensure that the PO file format is not severely broken and does not 8197prevent successful installation, and for the rest, to merely put these 8198PO files in @file{po/} for distribution. 8199 8200As a maintainer, you do not have to take on your shoulders the 8201responsibility of checking if the translations are adequate or 8202complete, and should avoid diving into linguistic matters. Translation 8203teams drive themselves and are fully responsible of their linguistic 8204choices for the Translation Project. Keep in mind that translator teams are @emph{not} 8205driven by maintainers. You can help by carefully redirecting all 8206communications and reports from users about linguistic matters to the 8207appropriate translation team, or explain users how to reach or join 8208their team. 8209 8210Maintainers should @emph{never ever} apply PO file bug reports 8211themselves, short-cutting translation teams. If some translator has 8212difficulty to get some of her points through her team, it should not be 8213an option for her to directly negotiate translations with maintainers. 8214Teams ought to settle their problems themselves, if any. If you, as 8215a maintainer, ever think there is a real problem with a team, please 8216never try to @emph{solve} a team's problem on your own. 8217 8218@node gettextize Invocation 8219@section Invoking the @code{gettextize} Program 8220 8221@include gettextize.texi 8222 8223@node Adjusting Files 8224@section Files You Must Create or Alter 8225@cindex @code{gettext} files 8226 8227Besides files which are automatically added through @code{gettextize}, 8228there are many files needing revision for properly interacting with 8229GNU @code{gettext}. If you are closely following GNU standards for 8230Makefile engineering and auto-configuration, the adaptations should 8231be easier to achieve. Here is a point by point description of the 8232changes needed in each. 8233 8234So, here comes a list of files, each one followed by a description of 8235all alterations it needs. Many examples are taken out from the GNU 8236@code{gettext} @value{VERSION} distribution itself, or from the GNU 8237@code{hello} distribution (@uref{https://www.gnu.org/software/hello}). 8238You may indeed refer to the source code of the GNU @code{gettext} and 8239GNU @code{hello} packages, as they are intended to be good examples for 8240using GNU gettext functionality. 8241 8242@menu 8243* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 8244* po/LINGUAS:: @file{LINGUAS} in @file{po/} 8245* po/Makevars:: @file{Makevars} in @file{po/} 8246* po/Rules-*:: Extending @file{Makefile} in @file{po/} 8247* configure.ac:: @file{configure.ac} at top level 8248* config.guess:: @file{config.guess}, @file{config.sub} at top level 8249* mkinstalldirs:: @file{mkinstalldirs} at top level 8250* aclocal:: @file{aclocal.m4} at top level 8251* config.h.in:: @file{config.h.in} at top level 8252* Makefile:: @file{Makefile.in} at top level 8253* src/Makefile:: @file{Makefile.in} in @file{src/} 8254* lib/gettext.h:: @file{gettext.h} in @file{lib/} 8255@end menu 8256 8257@node po/POTFILES.in 8258@subsection @file{POTFILES.in} in @file{po/} 8259@cindex @file{POTFILES.in} file 8260 8261The @file{po/} directory should receive a file named 8262@file{POTFILES.in}. This file tells which files, among all program 8263sources, have marked strings needing translation. Here is an example 8264of such a file: 8265 8266@example 8267@group 8268# List of source files containing translatable strings. 8269# Copyright (C) 1995 Free Software Foundation, Inc. 8270 8271# Common library files 8272lib/error.c 8273lib/getopt.c 8274lib/xmalloc.c 8275 8276# Package source files 8277src/gettext.c 8278src/msgfmt.c 8279src/xgettext.c 8280@end group 8281@end example 8282 8283@noindent 8284Hash-marked comments and white lines are ignored. All other lines 8285list those source files containing strings marked for translation 8286(@pxref{Mark Keywords}), in a notation relative to the top level 8287of your whole distribution, rather than the location of the 8288@file{POTFILES.in} file itself. 8289 8290When a C file is automatically generated by a tool, like @code{flex} or 8291@code{bison}, that doesn't introduce translatable strings by itself, 8292it is recommended to list in @file{po/POTFILES.in} the real source file 8293(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the 8294case of @code{bison}), not the generated C file. 8295 8296@node po/LINGUAS 8297@subsection @file{LINGUAS} in @file{po/} 8298@cindex @file{LINGUAS} file 8299 8300The @file{po/} directory should also receive a file named 8301@file{LINGUAS}. This file contains the list of available translations. 8302It is a whitespace separated list. Hash-marked comments and white lines 8303are ignored. Here is an example file: 8304 8305@example 8306@group 8307# Set of available languages. 8308de fr 8309@end group 8310@end example 8311 8312@noindent 8313This example means that German and French PO files are available, so 8314that these languages are currently supported by your package. If you 8315want to further restrict, at installation time, the set of installed 8316languages, this should not be done by modifying the @file{LINGUAS} file, 8317but rather by using the @code{LINGUAS} environment variable 8318(@pxref{Installers}). 8319 8320It is recommended that you add the "languages" @samp{en@@quot} and 8321@samp{en@@boldquot} to the @code{LINGUAS} file. @code{en@@quot} is a 8322variant of English message catalogs (@code{en}) which uses real quotation 8323marks instead of the ugly looking asymmetric ASCII substitutes @samp{`} 8324and @samp{'}. @code{en@@boldquot} is a variant of @code{en@@quot} that 8325additionally outputs quoted pieces of text in a bold font, when used in 8326a terminal emulator which supports the VT100 escape sequences (such as 8327@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode). 8328 8329These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot} 8330are constructed automatically, not by translators; to support them, you 8331need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed}, 8332@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin} 8333in the @file{po/} directory. You can copy them from GNU gettext's @file{po/} 8334directory; they are also installed by running @code{gettextize}. 8335 8336@node po/Makevars 8337@subsection @file{Makevars} in @file{po/} 8338@cindex @file{Makevars} file 8339 8340The @file{po/} directory also has a file named @file{Makevars}. It 8341contains variables that are specific to your project. @file{po/Makevars} 8342gets inserted into the @file{po/Makefile} when the latter is created. 8343The variables thus take effect when the POT file is created or updated, 8344and when the message catalogs get installed. 8345 8346The first three variables can be left unmodified if your package has a 8347single message domain and, accordingly, a single @file{po/} directory. 8348Only packages which have multiple @file{po/} directories at different 8349locations need to adjust the three first variables defined in 8350@file{Makevars}. 8351 8352As an alternative to the @code{XGETTEXT_OPTIONS} variable, it is also 8353possible to specify @code{xgettext} options through the 8354@code{AM_XGETTEXT_OPTION} autoconf macro. See @ref{AM_XGETTEXT_OPTION}. 8355 8356@node po/Rules-* 8357@subsection Extending @file{Makefile} in @file{po/} 8358@cindex @file{Makefile.in.in} extensions 8359 8360All files called @file{Rules-*} in the @file{po/} directory get appended to 8361the @file{po/Makefile} when it is created. They present an opportunity to 8362add rules for special PO files to the Makefile, without needing to mess 8363with @file{po/Makefile.in.in}. 8364 8365@cindex quotation marks 8366@vindex LANGUAGE@r{, environment variable} 8367GNU gettext comes with a @file{Rules-quot} file, containing rules for 8368building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}. The 8369effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE} 8370environment variable to @samp{en@@quot} will get messages with proper 8371looking symmetric Unicode quotation marks instead of abusing the ASCII 8372grave accent and the ASCII apostrophe for indicating quotations. To 8373enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS} 8374file. The effect of @file{en@@boldquot.po} is that people who set 8375@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation 8376marks, but also the quoted text will be shown in a bold font on terminals 8377and consoles. This catalog is useful only for command-line programs, not 8378GUI programs. To enable it, similarly add @code{en@@boldquot} to the 8379@file{po/LINGUAS} file. 8380 8381Similarly, you can create rules for building message catalogs for the 8382@file{sr@@latin} locale -- Serbian written with the Latin alphabet -- 8383from those for the @file{sr} locale -- Serbian written with Cyrillic 8384letters. See @ref{msgfilter Invocation}. 8385 8386@node configure.ac 8387@subsection @file{configure.ac} at top level 8388 8389@file{configure.ac} or @file{configure.in} - this is the source from which 8390@code{autoconf} generates the @file{configure} script. 8391 8392@enumerate 8393@item Declare the package and version. 8394@cindex package and version declaration in @file{configure.ac} 8395 8396This is done by a set of lines like these: 8397 8398@example 8399PACKAGE=gettext 8400VERSION=@value{VERSION} 8401AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") 8402AC_DEFINE_UNQUOTED(VERSION, "$VERSION") 8403AC_SUBST(PACKAGE) 8404AC_SUBST(VERSION) 8405@end example 8406 8407@noindent 8408or, if you are using GNU @code{automake}, by a line like this: 8409 8410@example 8411AM_INIT_AUTOMAKE(gettext, @value{VERSION}) 8412@end example 8413 8414@noindent 8415Of course, you replace @samp{gettext} with the name of your package, 8416and @samp{@value{VERSION}} by its version numbers, exactly as they 8417should appear in the packaged @code{tar} file name of your distribution 8418(@file{gettext-@value{VERSION}.tar.gz}, here). 8419 8420@item Check for internationalization support. 8421 8422Here is the main @code{m4} macro for triggering internationalization 8423support. Just add this line to @file{configure.ac}: 8424 8425@example 8426AM_GNU_GETTEXT([external]) 8427@end example 8428 8429@noindent 8430This call is purposely simple, even if it generates a lot of configure 8431time checking and actions. 8432 8433@item Have output files created. 8434 8435The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac} 8436file, needs to be modified in two ways: 8437 8438@example 8439AC_OUTPUT([@var{existing configuration files} po/Makefile.in], 8440[@var{existing additional actions}]) 8441@end example 8442 8443The modification to the first argument to @code{AC_OUTPUT} asks 8444for substitution in the @file{po/} directory. 8445Note the @samp{.in} suffix used for @file{po/} only. This is because 8446the distributed file is really @file{po/Makefile.in.in}. 8447 8448@end enumerate 8449 8450@node config.guess 8451@subsection @file{config.guess}, @file{config.sub} at top level 8452 8453You need to add the GNU @file{config.guess} and @file{config.sub} files 8454to your distribution. They are needed because the @code{AM_ICONV} macro 8455contains knowledge about specific platforms and therefore needs to 8456identify the platform. 8457 8458You can obtain the newest version of @file{config.guess} and 8459@file{config.sub} from the @samp{config} project at 8460@file{https://savannah.gnu.org/}. The commands to fetch them are 8461@smallexample 8462$ wget -O config.guess 'https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD' 8463$ wget -O config.sub 'https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD' 8464@end smallexample 8465@noindent 8466Less recent versions are also contained in the GNU @code{automake} and 8467GNU @code{libtool} packages. 8468 8469Normally, @file{config.guess} and @file{config.sub} are put at the 8470top level of a distribution. But it is also possible to put them in a 8471subdirectory, altogether with other configuration support files like 8472@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}. 8473All you need to do, other than moving the files, is to add the following line 8474to your @file{configure.ac}. 8475 8476@example 8477AC_CONFIG_AUX_DIR([@var{subdir}]) 8478@end example 8479 8480@node mkinstalldirs 8481@subsection @file{mkinstalldirs} at top level 8482@cindex @file{mkinstalldirs} file 8483 8484With earlier versions of GNU gettext, you needed to add the GNU 8485@file{mkinstalldirs} script to your distribution. This is not needed any 8486more. You can remove it. 8487 8488@node aclocal 8489@subsection @file{aclocal.m4} at top level 8490@cindex @file{aclocal.m4} file 8491 8492If you do not have an @file{aclocal.m4} file in your distribution, 8493the simplest is to concatenate the files @file{gettext.m4}, 8494@file{host-cpu-c-abi.m4}, @file{intlmacosx.m4}, @file{iconv.m4}, 8495@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, @file{nls.m4}, 8496@file{po.m4}, @file{progtest.m4} from GNU @code{gettext}'s @file{m4/} 8497directory into a single file. 8498 8499If you already have an @file{aclocal.m4} file, then you will have 8500to merge the said macro files into your @file{aclocal.m4}. Note that if 8501you are upgrading from a previous release of GNU @code{gettext}, you 8502should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT}, 8503etc.), as they usually 8504change a little from one release of GNU @code{gettext} to the next. 8505Their contents may vary as we get more experience with strange systems 8506out there. 8507 8508You should be using GNU @code{automake} 1.9 or newer. With it, you need 8509to copy the files @file{gettext.m4}, @file{host-cpu-c-abi.m4}, 8510@file{intlmacosx.m4}, @file{iconv.m4}, @file{lib-ld.m4}, @file{lib-link.m4}, 8511@file{lib-prefix.m4}, @file{nls.m4}, @file{po.m4}, @file{progtest.m4} from 8512GNU @code{gettext}'s @file{m4/} directory to a subdirectory named @file{m4/} 8513and add the line 8514 8515@example 8516ACLOCAL_AMFLAGS = -I m4 8517@end example 8518 8519@noindent 8520to your top level @file{Makefile.am}. 8521 8522If you are using GNU @code{automake} 1.10 or newer, it is even easier: 8523Add the line 8524 8525@example 8526ACLOCAL_AMFLAGS = --install -I m4 8527@end example 8528 8529@noindent 8530to your top level @file{Makefile.am}, and run @samp{aclocal --install -I m4}. 8531This will copy the needed files to the @file{m4/} subdirectory automatically, 8532before updating @file{aclocal.m4}. 8533 8534These macros check for the internationalization support functions 8535and related informations. Hopefully, once stabilized, these macros 8536might be integrated in the standard Autoconf set, because this 8537piece of @code{m4} code will be the same for all projects using GNU 8538@code{gettext}. 8539 8540@node config.h.in 8541@subsection @file{config.h.in} at top level 8542@cindex @file{config.h.in} file 8543 8544The include file template that holds the C macros to be defined by 8545@code{configure} is usually called @file{config.h.in} and may be 8546maintained either manually or automatically. 8547 8548If it is maintained automatically, by use of the @samp{autoheader} 8549program, you need to do nothing about it. This is the case in particular 8550if you are using GNU @code{automake}. 8551 8552If it is maintained manually, you can get away by adding the 8553following lines to @file{config.h.in}: 8554 8555@example 8556/* Define to 1 if translation of program messages to the user's 8557 native language is requested. */ 8558#undef ENABLE_NLS 8559@end example 8560 8561@node Makefile 8562@subsection @file{Makefile.in} at top level 8563 8564Here are a few modifications you need to make to your main, top-level 8565@file{Makefile.in} file. 8566 8567@enumerate 8568@item 8569Add the following lines near the beginning of your @file{Makefile.in}, 8570so the @samp{dist:} goal will work properly (as explained further down): 8571 8572@example 8573PACKAGE = @@PACKAGE@@ 8574VERSION = @@VERSION@@ 8575@end example 8576 8577@item 8578Wherever you process subdirectories in your @file{Makefile.in}, be sure 8579you also process the subdirectory @samp{po}. Special 8580rules in the @file{Makefiles} take care for the case where no 8581internationalization is wanted. 8582 8583If you are using Makefiles, either generated by automake, or hand-written 8584so they carefully follow the GNU coding standards, the effected goals for 8585which the new subdirectories must be handled include @samp{installdirs}, 8586@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}. 8587 8588Here is an example of a canonical order of processing. In this 8589example, we also define @code{SUBDIRS} in @code{Makefile.in} for it 8590to be further used in the @samp{dist:} goal. 8591 8592@example 8593SUBDIRS = doc lib src po 8594@end example 8595 8596@item 8597A delicate point is the @samp{dist:} goal, as @file{po/Makefile} will later 8598assume that the proper directory has been set up from the main @file{Makefile}. 8599Here is an example at what the @samp{dist:} goal might look like: 8600 8601@example 8602distdir = $(PACKAGE)-$(VERSION) 8603dist: Makefile 8604 rm -fr $(distdir) 8605 mkdir $(distdir) 8606 chmod 777 $(distdir) 8607 for file in $(DISTFILES); do \ 8608 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ 8609 done 8610 for subdir in $(SUBDIRS); do \ 8611 mkdir $(distdir)/$$subdir || exit 1; \ 8612 chmod 777 $(distdir)/$$subdir; \ 8613 (cd $$subdir && $(MAKE) $@@) || exit 1; \ 8614 done 8615 tar chozf $(distdir).tar.gz $(distdir) 8616 rm -fr $(distdir) 8617@end example 8618 8619@end enumerate 8620 8621Note that if you are using GNU @code{automake}, @file{Makefile.in} is 8622automatically generated from @file{Makefile.am}, and all needed changes 8623to @file{Makefile.am} are already made by running @samp{gettextize}. 8624 8625@node src/Makefile 8626@subsection @file{Makefile.in} in @file{src/} 8627 8628Some of the modifications made in the main @file{Makefile.in} will 8629also be needed in the @file{Makefile.in} from your package sources, 8630which we assume here to be in the @file{src/} subdirectory. Here are 8631all the modifications needed in @file{src/Makefile.in}: 8632 8633@enumerate 8634@item 8635In view of the @samp{dist:} goal, you should have these lines near the 8636beginning of @file{src/Makefile.in}: 8637 8638@example 8639PACKAGE = @@PACKAGE@@ 8640VERSION = @@VERSION@@ 8641@end example 8642 8643@item 8644If not done already, you should guarantee that @code{top_srcdir} 8645gets defined. This will serve for @code{cpp} include files. Just add 8646the line: 8647 8648@example 8649top_srcdir = @@top_srcdir@@ 8650@end example 8651 8652@item 8653You might also want to define @code{subdir} as @samp{src}, later 8654allowing for almost uniform @samp{dist:} goals in all your 8655@file{Makefile.in}. At list, the @samp{dist:} goal below assume that 8656you used: 8657 8658@example 8659subdir = src 8660@end example 8661 8662@item 8663The @code{main} function of your program will normally call 8664@code{bindtextdomain} (see @pxref{Triggering}), like this: 8665 8666@example 8667bindtextdomain (@var{PACKAGE}, LOCALEDIR); 8668textdomain (@var{PACKAGE}); 8669@end example 8670 8671On native Windows platforms, the @code{main} function may call 8672@code{wbindtextdomain} instead of @code{bindtextdomain}. 8673 8674To make LOCALEDIR known to the program, add the following lines to 8675@file{Makefile.in}: 8676 8677@example 8678datadir = @@datadir@@ 8679datarootdir= @@datarootdir@@ 8680localedir = @@localedir@@ 8681DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@ 8682@end example 8683 8684Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, and 8685@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}. 8686 8687@item 8688You should ensure that the final linking will use @code{@@LIBINTL@@} or 8689@code{@@LTLIBINTL@@} as a library. @code{@@LIBINTL@@} is for use without 8690@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}. An 8691easy way to achieve this is to manage that it gets into @code{LIBS}, like 8692this: 8693 8694@example 8695LIBS = @@LIBINTL@@ @@LIBS@@ 8696@end example 8697 8698In most packages internationalized with GNU @code{gettext}, one will 8699find a directory @file{lib/} in which a library containing some helper 8700functions will be build. (You need at least the few functions which the 8701GNU @code{gettext} Library itself needs.) However some of the functions 8702in the @file{lib/} also give messages to the user which of course should be 8703translated, too. Taking care of this, the support library (say 8704@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and 8705@code{@@LIBS@@} in the above example. So one has to write this: 8706 8707@example 8708LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@ 8709@end example 8710 8711@item 8712Your @samp{dist:} goal has to conform with others. Here is a 8713reasonable definition for it: 8714 8715@example 8716distdir = ../$(PACKAGE)-$(VERSION)/$(subdir) 8717dist: Makefile $(DISTFILES) 8718 for file in $(DISTFILES); do \ 8719 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \ 8720 done 8721@end example 8722 8723@end enumerate 8724 8725Note that if you are using GNU @code{automake}, @file{Makefile.in} is 8726automatically generated from @file{Makefile.am}, and the first three 8727changes and the last change are not necessary. The remaining needed 8728@file{Makefile.am} modifications are the following: 8729 8730@enumerate 8731@item 8732To make LOCALEDIR known to the program, add the following to 8733@file{Makefile.am}: 8734 8735@example 8736<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 8737@end example 8738 8739@noindent 8740for each specific module or compilation unit, or 8741 8742@example 8743AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 8744@end example 8745 8746for all modules and compilation units together. Furthermore, if you are 8747using an Autoconf version older then 2.60, add this line to define 8748@samp{localedir}: 8749 8750@example 8751localedir = $(datadir)/locale 8752@end example 8753 8754@item 8755To ensure that the final linking will use @code{@@LIBINTL@@} or 8756@code{@@LTLIBINTL@@} as a library, add the following to 8757@file{Makefile.am}: 8758 8759@example 8760<program>_LDADD = @@LIBINTL@@ 8761@end example 8762 8763@noindent 8764for each specific program, or 8765 8766@example 8767LDADD = @@LIBINTL@@ 8768@end example 8769 8770for all programs together. Remember that when you use @code{libtool} 8771to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@ 8772for that program. 8773 8774@end enumerate 8775 8776@node lib/gettext.h 8777@subsection @file{gettext.h} in @file{lib/} 8778@cindex @file{gettext.h} file 8779@cindex turning off NLS support 8780@cindex disabling NLS 8781 8782Internationalization of packages, as provided by GNU @code{gettext}, is 8783optional. It can be turned off in two situations: 8784 8785@itemize @bullet 8786@item 8787When the installer has specified @samp{./configure --disable-nls}. This 8788can be useful when small binaries are more important than features, for 8789example when building utilities for boot diskettes. It can also be useful 8790in order to get some specific C compiler warnings about code quality with 8791some older versions of GCC (older than 3.0). 8792 8793@item 8794When the libintl.h header (with its associated libintl library, if any) is 8795not already installed on the system, it is preferable that the package builds 8796without internationalization support, rather than to give a compilation 8797error. 8798@end itemize 8799 8800A C preprocessor macro can be used to detect these two cases. Usually, 8801when @code{libintl.h} was found and not explicitly disabled, the 8802@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated 8803configuration file (usually called @file{config.h}). In the two negative 8804situations, however, this macro will not be defined, thus it will evaluate 8805to 0 in C preprocessor expressions. 8806 8807@cindex include file @file{libintl.h} 8808@file{gettext.h} is a convenience header file for conditional use of 8809@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro. If 8810@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it 8811defines no-op substitutes for the libintl.h functions. We recommend 8812the use of @code{"gettext.h"} over direct use of @file{<libintl.h>}, 8813so that portability to older systems is guaranteed and installers can 8814turn off internationalization if they want to. In the C code, you will 8815then write 8816 8817@example 8818#include "gettext.h" 8819@end example 8820 8821@noindent 8822instead of 8823 8824@example 8825#include <libintl.h> 8826@end example 8827 8828The location of @code{gettext.h} is usually in a directory containing 8829auxiliary include files. In many GNU packages, there is a directory 8830@file{lib/} containing helper functions; @file{gettext.h} fits there. 8831In other packages, it can go into the @file{src} directory. 8832 8833Do not install the @code{gettext.h} file in public locations. Every 8834package that needs it should contain a copy of it on its own. 8835 8836@node autoconf macros 8837@section Autoconf macros for use in @file{configure.ac} 8838@cindex autoconf macros for @code{gettext} 8839 8840GNU @code{gettext} installs macros for use in a package's 8841@file{configure.ac} or @file{configure.in}. 8842@xref{Top, , Introduction, autoconf, The Autoconf Manual}. 8843The primary macro is, of course, @code{AM_GNU_GETTEXT}. 8844 8845@menu 8846* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 8847* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 8848* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 8849* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 8850* AM_XGETTEXT_OPTION:: AM_XGETTEXT_OPTION in @file{po.m4} 8851* AM_ICONV:: AM_ICONV in @file{iconv.m4} 8852@end menu 8853 8854@node AM_GNU_GETTEXT 8855@subsection AM_GNU_GETTEXT in @file{gettext.m4} 8856 8857@amindex AM_GNU_GETTEXT 8858The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext 8859function family in either the C library or a separate @code{libintl} 8860library (shared or static libraries are both supported). It also invokes 8861@code{AM_PO_SUBDIRS}, thus preparing the @file{po/} directories of the 8862package for building. 8863 8864@code{AM_GNU_GETTEXT} accepts up to three optional arguments. The general 8865syntax is 8866 8867@example 8868AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}]) 8869@end example 8870 8871@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because 8872@c it is of no use for packages other than GNU gettext itself. (Such packages 8873@c are not allowed to install the shared libintl. But if they use libtool, 8874@c then it is in order to install shared libraries that depend on libintl.) 8875@var{intlsymbol} should always be @samp{external}. 8876 8877If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU 8878gettext implementations (in libc or libintl) without the @code{ngettext()} 8879function will be ignored. If @var{needsymbol} is specified and is 8880@samp{need-formatstring-macros}, then GNU gettext implementations that don't 8881support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored. 8882Only one @var{needsymbol} can be specified. These requirements can also be 8883specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere. To specify 8884more than one requirement, just specify the strongest one among them, or 8885invoke the @code{AM_GNU_GETTEXT_NEED} macro several times. The hierarchy 8886among the various alternatives is as follows: @samp{need-formatstring-macros} 8887implies @samp{need-ngettext}. 8888 8889The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is 8890available and should be used. If so, it sets the @code{USE_NLS} variable 8891to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf 8892generated configuration file (usually called @file{config.h}); it sets 8893the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options 8894for use in a Makefile (@code{LIBINTL} for use without libtool, 8895@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to 8896@code{CPPFLAGS} if necessary. In the negative case, it sets 8897@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL} 8898to empty and doesn't change @code{CPPFLAGS}. 8899 8900The complexities that @code{AM_GNU_GETTEXT} deals with are the following: 8901 8902@itemize @bullet 8903@item 8904@cindex @code{libintl} library 8905Some operating systems have @code{gettext} in the C library, for example 8906glibc. Some have it in a separate library @code{libintl}. GNU @code{libintl} 8907might have been installed as part of the GNU @code{gettext} package. 8908 8909@item 8910GNU @code{libintl}, if installed, is not necessarily already in the search 8911path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 8912the library search path). 8913 8914@item 8915Except for glibc, the operating system's native @code{gettext} cannot 8916exploit the GNU mo files, doesn't have the necessary locale dependency 8917features, and cannot convert messages from the catalog's text encoding 8918to the user's locale encoding. 8919 8920@item 8921GNU @code{libintl}, if installed, is not necessarily already in the 8922run time library search path. To avoid the need for setting an environment 8923variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 8924run time search path options to the @code{LIBINTL} and @code{LTLIBINTL} 8925variables. This works on most systems, but not on some operating systems 8926with limited shared library support, like SCO. 8927 8928@item 8929GNU @code{libintl} relies on POSIX/XSI @code{iconv}. The macro checks for 8930linker options needed to use iconv and appends them to the @code{LIBINTL} 8931and @code{LTLIBINTL} variables. 8932@end itemize 8933 8934@node AM_GNU_GETTEXT_VERSION 8935@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 8936 8937@amindex AM_GNU_GETTEXT_VERSION 8938The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of 8939the GNU gettext infrastructure that is used by the package. 8940 8941The use of this macro is optional; only the @code{autopoint} program makes 8942use of it (@pxref{Version Control Issues}). 8943 8944@node AM_GNU_GETTEXT_NEED 8945@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4} 8946 8947@amindex AM_GNU_GETTEXT_NEED 8948The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the 8949GNU gettext implementation. The syntax is 8950 8951@example 8952AM_GNU_GETTEXT_NEED([@var{needsymbol}]) 8953@end example 8954 8955If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations 8956(in libc or libintl) without the @code{ngettext()} function will be ignored. 8957If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext 8958implementations that don't support the ISO C 99 @file{<inttypes.h>} 8959formatstring macros will be ignored. 8960 8961The optional second argument of @code{AM_GNU_GETTEXT} is also taken into 8962account. 8963 8964The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after 8965the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter. 8966 8967@node AM_PO_SUBDIRS 8968@subsection AM_PO_SUBDIRS in @file{po.m4} 8969 8970@amindex AM_PO_SUBDIRS 8971The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the 8972package for building. This macro should be used in internationalized 8973programs written in other programming languages than C, C++, Objective C, 8974for example @code{sh}, @code{Python}, @code{Lisp}. See @ref{Programming 8975Languages} for a list of programming languages that support localization 8976through PO files. 8977 8978The @code{AM_PO_SUBDIRS} macro determines whether internationalization 8979should be used. If so, it sets the @code{USE_NLS} variable to @samp{yes}, 8980otherwise to @samp{no}. It also determines the right values for Makefile 8981variables in each @file{po/} directory. 8982 8983@node AM_XGETTEXT_OPTION 8984@subsection AM_XGETTEXT_OPTION in @file{po.m4} 8985 8986@amindex AM_XGETTEXT_OPTION 8987The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be 8988used in the invocations of @code{xgettext} in the @file{po/} directories 8989of the package. 8990 8991For example, if you have a source file that defines a function 8992@samp{error_at_line} whose fifth argument is a format string, you can use 8993@example 8994AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format]) 8995@end example 8996@noindent 8997to instruct @code{xgettext} to mark all translatable strings in @samp{gettext} 8998invocations that occur as fifth argument to this function as @samp{c-format}. 8999 9000See @ref{xgettext Invocation} for the list of options that @code{xgettext} 9001accepts. 9002 9003The use of this macro is an alternative to the use of the 9004@samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}. 9005 9006@node AM_ICONV 9007@subsection AM_ICONV in @file{iconv.m4} 9008 9009@amindex AM_ICONV 9010The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI 9011@code{iconv} function family in either the C library or a separate 9012@code{libiconv} library. If found, it sets the @code{am_cv_func_iconv} 9013variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf 9014generated configuration file (usually called @file{config.h}); it defines 9015@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the 9016second argument of @code{iconv()} is of type @samp{const char **} or 9017@samp{char **}; it sets the variables @code{LIBICONV} and 9018@code{LTLIBICONV} to the linker options for use in a Makefile 9019(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with 9020libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if 9021necessary. If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to 9022empty and doesn't change @code{CPPFLAGS}. 9023 9024The complexities that @code{AM_ICONV} deals with are the following: 9025 9026@itemize @bullet 9027@item 9028@cindex @code{libiconv} library 9029Some operating systems have @code{iconv} in the C library, for example 9030glibc. Some have it in a separate library @code{libiconv}, for example 9031OSF/1 or FreeBSD. Regardless of the operating system, GNU @code{libiconv} 9032might have been installed. In that case, it should be used instead of the 9033operating system's native @code{iconv}. 9034 9035@item 9036GNU @code{libiconv}, if installed, is not necessarily already in the search 9037path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 9038the library search path). 9039 9040@item 9041GNU @code{libiconv} is binary incompatible with some operating system's 9042native @code{iconv}, for example on FreeBSD. Use of an @file{iconv.h} 9043and @file{libiconv.so} that don't fit together would produce program 9044crashes. 9045 9046@item 9047GNU @code{libiconv}, if installed, is not necessarily already in the 9048run time library search path. To avoid the need for setting an environment 9049variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 9050run time search path options to the @code{LIBICONV} variable. This works 9051on most systems, but not on some operating systems with limited shared 9052library support, like SCO. 9053@end itemize 9054 9055@file{iconv.m4} is distributed with the GNU gettext package because 9056@file{gettext.m4} relies on it. 9057 9058@node Version Control Issues 9059@section Integrating with Version Control Systems 9060 9061Many projects use version control systems for distributed development 9062and source backup. This section gives some advice how to manage the 9063uses of @code{gettextize}, @code{autopoint} and @code{autoconf} on 9064version controlled files. 9065 9066@menu 9067* Distributed Development:: Avoiding version mismatch in distributed development 9068* Files under Version Control:: Files to put under version control 9069* Translations under Version Control:: Put PO Files under Version Control 9070* autopoint Invocation:: Invoking the @code{autopoint} Program 9071@end menu 9072 9073@node Distributed Development 9074@subsection Avoiding version mismatch in distributed development 9075 9076In a project development with multiple developers, there should be a 9077single developer who occasionally - when there is desire to upgrade to 9078a new @code{gettext} version - runs @code{gettextize} and performs the 9079changes listed in @ref{Adjusting Files}, and then commits his changes 9080to the repository. 9081 9082It is highly recommended that all developers on a project use the same 9083version of GNU @code{gettext} in the package. In other words, if a 9084developer runs @code{gettextize}, he should go the whole way, make the 9085necessary remaining changes and commit his changes to the repository. 9086Otherwise the following damages will likely occur: 9087 9088@itemize @bullet 9089@item 9090Apparent version mismatch between developers. Since some @code{gettext} 9091specific portions in @file{configure.ac}, @file{configure.in} and 9092@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext} 9093version, the use of infrastructure files belonging to different 9094@code{gettext} versions can easily lead to build errors. 9095 9096@item 9097Hidden version mismatch. Such version mismatch can also lead to 9098malfunctioning of the package, that may be undiscovered by the developers. 9099The worst case of hidden version mismatch is that internationalization 9100of the package doesn't work at all. 9101 9102@item 9103Release risks. All developers implicitly perform constant testing on 9104a package. This is important in the days and weeks before a release. 9105If the guy who makes the release tar files uses a different version 9106of GNU @code{gettext} than the other developers, the distribution will 9107be less well tested than if all had been using the same @code{gettext} 9108version. For example, it is possible that a platform specific bug goes 9109undiscovered due to this constellation. 9110@end itemize 9111 9112@node Files under Version Control 9113@subsection Files to put under version control 9114 9115There are basically three ways to deal with generated files in the 9116context of a version controlled repository, such as @file{configure} 9117generated from @file{configure.ac}, @code{@var{parser}.c} generated 9118from @code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled 9119by @code{gettextize} or @code{autopoint}. 9120 9121@enumerate 9122@item 9123All generated files are always committed into the repository. 9124 9125@item 9126All generated files are committed into the repository occasionally, 9127for example each time a release is made. 9128 9129@item 9130Generated files are never committed into the repository. 9131@end enumerate 9132 9133Each of these three approaches has different advantages and drawbacks. 9134 9135@enumerate 9136@item 9137The advantage is that anyone can check out the source at any moment and 9138gets a working build. The drawbacks are: 1a. It requires some frequent 9139"push" actions by the maintainers. 1b. The repository grows in size 9140quite fast. 9141 9142@item 9143The advantage is that anyone can check out the source, and the usual 9144"./configure; make" will work. The drawbacks are: 2a. The one who 9145checks out the repository needs tools like GNU @code{automake}, GNU 9146@code{autoconf}, GNU @code{m4} installed in his PATH; sometimes he 9147even needs particular versions of them. 2b. When a release is made 9148and a commit is made on the generated files, the other developers get 9149conflicts on the generated files when merging the local work back to 9150the repository. Although these conflicts are easy to resolve, they 9151are annoying. 9152 9153@item 9154The advantage is less work for the maintainers. The drawback is that 9155anyone who checks out the source not only needs tools like GNU 9156@code{automake}, GNU @code{autoconf}, GNU @code{m4} installed in his 9157PATH, but also that he needs to perform a package specific pre-build 9158step before being able to "./configure; make". 9159@end enumerate 9160 9161For the first and second approach, all files modified or brought in 9162by the occasional @code{gettextize} invocation and update should be 9163committed into the repository. 9164 9165For the third approach, the maintainer can omit from the repository 9166all the files that @code{gettextize} mentions as "copy". Instead, he 9167adds to the @file{configure.ac} or @file{configure.in} a line of the 9168form 9169 9170@example 9171AM_GNU_GETTEXT_VERSION(@value{ARCHIVE-VERSION}) 9172@end example 9173 9174@noindent 9175and adds to the package's pre-build script an invocation of 9176@samp{autopoint}. For everyone who checks out the source, this 9177@code{autopoint} invocation will copy into the right place the 9178@code{gettext} infrastructure files that have been omitted from the repository. 9179 9180The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is 9181the version of the @code{gettext} infrastructure that the package wants 9182to use. It is also the minimum version number of the @samp{autopoint} 9183program. So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the 9184developers can have any version >= 0.11.5 installed; the package will work 9185with the 0.11.5 infrastructure in all developers' builds. When the 9186maintainer then runs gettextize from, say, version 0.12.1 on the package, 9187the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed 9188into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that 9189use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer 9190installed. 9191 9192@node Translations under Version Control 9193@subsection Put PO Files under Version Control 9194 9195Since translations are valuable assets as well as the source code, it 9196would make sense to put them under version control. The GNU gettext 9197infrastructure supports two ways to deal with translations in the 9198context of a version controlled repository. 9199 9200@enumerate 9201@item 9202Both POT file and PO files are committed into the repository. 9203 9204@item 9205Only PO files are committed into the repository. 9206 9207@end enumerate 9208 9209If a POT file is absent when building, it will be generated by 9210scanning the source files with @code{xgettext}, and then the PO files 9211are regenerated as a dependency. On the other hand, some maintainers 9212want to keep the POT file unchanged during the development phase. So, 9213even if a POT file is present and older than the source code, it won't 9214be updated automatically. You can manually update it with @code{make 9215$(DOMAIN).pot-update}, and commit it at certain point. 9216 9217Special advices for particular version control systems: 9218 9219@itemize @bullet 9220@item 9221Recent version control systems, Git for instance, ignore file's 9222timestamp. In that case, PO files can be accidentally updated even if 9223a POT file is not updated. To prevent this, you can set 9224@samp{PO_DEPENDS_ON_POT} variable to @code{no} in the @file{Makevars} 9225file and do @code{make update-po} manually. 9226 9227@item 9228Location comments such as @code{#: lib/error.c:116} are sometimes 9229annoying, since these comments are volatile and may introduce unwanted 9230change to the working copy when building. To mitigate this, you can 9231decide to omit those comments from the PO files in the repository. 9232 9233This is possible with the @code{--no-location} option of the 9234@code{msgmerge} command @footnote{you can also use it through the 9235@samp{MSGMERGE_OPTIONS} option from @file{Makevars}}. The drawback is 9236that, if the location information is needed, translators have to 9237recover the location comments by running @code{msgmerge} again. 9238 9239@end itemize 9240 9241@node autopoint Invocation 9242@subsection Invoking the @code{autopoint} Program 9243 9244@include autopoint.texi 9245 9246@node Release Management 9247@section Creating a Distribution Tarball 9248 9249@cindex release 9250@cindex distribution tarball 9251In projects that use GNU @code{automake}, the usual commands for creating 9252a distribution tarball, @samp{make dist} or @samp{make distcheck}, 9253automatically update the PO files as needed. 9254 9255If GNU @code{automake} is not used, the maintainer needs to perform this 9256update before making a release: 9257 9258@example 9259$ ./configure 9260$ (cd po; make update-po) 9261$ make distclean 9262@end example 9263 9264@node Installers 9265@chapter The Installer's and Distributor's View 9266@cindex package installer's view of @code{gettext} 9267@cindex package distributor's view of @code{gettext} 9268@cindex package build and installation options 9269@cindex setting up @code{gettext} at build time 9270 9271By default, packages fully using GNU @code{gettext}, internally, 9272are installed in such a way as to allow translation of 9273messages. At @emph{configuration} time, those packages should 9274automatically detect whether the underlying host system already provides 9275the GNU @code{gettext} functions. If not, 9276the GNU @code{gettext} library should be automatically prepared 9277and used. Installers may use special options at configuration 9278time for changing this behavior. The command @samp{./configure 9279--with-included-gettext} bypasses system @code{gettext} to 9280use the included GNU @code{gettext} instead, 9281while @samp{./configure --disable-nls} 9282produces programs totally unable to translate messages. 9283 9284@vindex LINGUAS@r{, environment variable} 9285Internationalized packages have usually many @file{@var{ll}.po} 9286files. Unless 9287translations are disabled, all those available are installed together 9288with the package. However, the environment variable @code{LINGUAS} 9289may be set, prior to configuration, to limit the installed set. 9290@code{LINGUAS} should then contain a space separated list of two-letter 9291codes, stating which languages are allowed. 9292 9293@node Programming Languages 9294@chapter Other Programming Languages 9295 9296While the presentation of @code{gettext} focuses mostly on C and 9297implicitly applies to C++ as well, its scope is far broader than that: 9298Many programming languages, scripting languages and other textual data 9299like GUI resources or package descriptions can make use of the gettext 9300approach. 9301 9302@menu 9303* Language Implementors:: The Language Implementor's View 9304* Programmers for other Languages:: The Programmer's View 9305* Translators for other Languages:: The Translator's View 9306* Maintainers for other Languages:: The Maintainer's View 9307* List of Programming Languages:: Individual Programming Languages 9308@end menu 9309 9310@node Language Implementors 9311@section The Language Implementor's View 9312@cindex programming languages 9313@cindex scripting languages 9314 9315All programming and scripting languages that have the notion of strings 9316are eligible to supporting @code{gettext}. Supporting @code{gettext} 9317means the following: 9318 9319@enumerate 9320@item 9321You should add to the language a syntax for translatable strings. In 9322principle, a function call of @code{gettext} would do, but a shorthand 9323syntax helps keeping the legibility of internationalized programs. For 9324example, in C we use the syntax @code{_("string")}, and in GNU awk we use 9325the shorthand @code{_"string"}. 9326 9327@item 9328You should arrange that evaluation of such a translatable string at 9329runtime calls the @code{gettext} function, or performs equivalent 9330processing. 9331 9332@item 9333Similarly, you should make the functions @code{ngettext}, 9334@code{dcgettext}, @code{dcngettext} available from within the language. 9335These functions are less often used, but are nevertheless necessary for 9336particular purposes: @code{ngettext} for correct plural handling, and 9337@code{dcgettext} and @code{dcngettext} for obeying other locale-related 9338environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or 9339@code{LC_MONETARY}. For these latter functions, you need to make the 9340@code{LC_*} constants, available in the C header @code{<locale.h>}, 9341referenceable from within the language, usually either as enumeration 9342values or as strings. 9343 9344@item 9345You should allow the programmer to designate a message domain, either by 9346making the @code{textdomain} function available from within the 9347language, or by introducing a magic variable called @code{TEXTDOMAIN}. 9348Similarly, you should allow the programmer to designate where to search 9349for message catalogs, by providing access to the @code{bindtextdomain} 9350function or --- on native Windows platforms --- to the @code{wbindtextdomain} 9351function. 9352 9353@item 9354You should either perform a @code{setlocale (LC_ALL, "")} call during 9355the startup of your language runtime, or allow the programmer to do so. 9356Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and 9357@code{LC_CTYPE} locale categories are not both set. 9358 9359@item 9360A programmer should have a way to extract translatable strings from a 9361program into a PO file. The GNU @code{xgettext} program is being 9362extended to support very different programming languages. Please 9363contact the GNU @code{gettext} maintainers to help them doing this. 9364The GNU @code{gettext} maintainers will need from you a formal 9365description of the lexical structure of source files. It should 9366answer the questions: 9367@itemize @bullet 9368@item 9369What does a token look like? 9370@item 9371What does a string literal look like? What escape characters exist 9372inside a string? 9373@item 9374What escape characters exist outside of strings? If Unicode escapes 9375are supported, are they applied before or after tokenization? 9376@item 9377What is the syntax for function calls? How are consecutive arguments 9378in the same function call separated? 9379@item 9380What is the syntax for comments? 9381@end itemize 9382@noindent Based on this description, the GNU @code{gettext} maintainers 9383can add support to @code{xgettext}. 9384 9385If the string extractor is best integrated into your language's parser, 9386GNU @code{xgettext} can function as a front end to your string extractor. 9387 9388@item 9389The language's library should have a string formatting facility. 9390Additionally: 9391@enumerate 9392@item 9393There must be a way, in the format string, to denote the arguments by a 9394positional number or a name. This is needed because for some languages 9395and some messages with more than one substitutable argument, the 9396translation will need to output the substituted arguments in different 9397order. @xref{c-format Flag}. 9398@item 9399The syntax of format strings must be documented in a way that translators 9400can understand. The GNU @code{gettext} manual will be extended to 9401include a pointer to this documentation. 9402@end enumerate 9403Based on this, the GNU @code{gettext} maintainers can add a format string 9404equivalence checker to @code{msgfmt}, so that translators get told 9405immediately when they have made a mistake during the translation of a 9406format string. 9407 9408@item 9409If the language has more than one implementation, and not all of the 9410implementations use @code{gettext}, but the programs should be portable 9411across implementations, you should provide a no-i18n emulation, that 9412makes the other implementations accept programs written for yours, 9413without actually translating the strings. 9414 9415@item 9416To help the programmer in the task of marking translatable strings, 9417which is sometimes performed using the Emacs PO mode (@pxref{Marking}), 9418you are welcome to 9419contact the GNU @code{gettext} maintainers, so they can add support for 9420your language to @file{po-mode.el}. 9421@end enumerate 9422 9423On the implementation side, two approaches are possible, with 9424different effects on portability and copyright: 9425 9426@itemize @bullet 9427@item 9428You may link against GNU @code{gettext} functions if they are found in 9429the C library. For example, an autoconf test for @code{gettext()} and 9430@code{ngettext()} will detect this situation. For the moment, this test 9431will succeed on GNU systems and on Solaris 11 platforms. No severe 9432copyright restrictions apply, except if you want to distribute statically 9433linked binaries. 9434 9435@item 9436You may emulate or reimplement the GNU @code{gettext} functionality. 9437This has the advantage of full portability and no copyright 9438restrictions, but also the drawback that you have to reimplement the GNU 9439@code{gettext} features (such as the @code{LANGUAGE} environment 9440variable, the locale aliases database, the automatic charset conversion, 9441and plural handling). 9442@end itemize 9443 9444@node Programmers for other Languages 9445@section The Programmer's View 9446 9447For the programmer, the general procedure is the same as for the C 9448language. The Emacs PO mode marking supports other languages, and the GNU 9449@code{xgettext} string extractor recognizes other languages based on the 9450file extension or a command-line option. In some languages, 9451@code{setlocale} is not needed because it is already performed by the 9452underlying language runtime. 9453 9454@node Translators for other Languages 9455@section The Translator's View 9456 9457The translator works exactly as in the C language case. The only 9458difference is that when translating format strings, she has to be aware 9459of the language's particular syntax for positional arguments in format 9460strings. 9461 9462@menu 9463* c-format:: C Format Strings 9464* objc-format:: Objective C Format Strings 9465* python-format:: Python Format Strings 9466* java-format:: Java Format Strings 9467* csharp-format:: C# Format Strings 9468* javascript-format:: JavaScript Format Strings 9469* scheme-format:: Scheme Format Strings 9470* lisp-format:: Lisp Format Strings 9471* elisp-format:: Emacs Lisp Format Strings 9472* librep-format:: librep Format Strings 9473* ruby-format:: Ruby Format Strings 9474* sh-format:: Shell Format Strings 9475* awk-format:: awk Format Strings 9476* lua-format:: Lua Format Strings 9477* object-pascal-format:: Object Pascal Format Strings 9478* smalltalk-format:: Smalltalk Format Strings 9479* qt-format:: Qt Format Strings 9480* qt-plural-format:: Qt Plural Format Strings 9481* kde-format:: KDE Format Strings 9482* kde-kuit-format:: KUIT Format Strings 9483* boost-format:: Boost Format Strings 9484* tcl-format:: Tcl Format Strings 9485* perl-format:: Perl Format Strings 9486* php-format:: PHP Format Strings 9487* gcc-internal-format:: GCC internal Format Strings 9488* gfc-internal-format:: GFC internal Format Strings 9489* ycp-format:: YCP Format Strings 9490@end menu 9491 9492@node c-format 9493@subsection C Format Strings 9494 9495C format strings are described in POSIX (IEEE P1003.1 2001), section 9496XSH 3 fprintf(), 9497@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}. 9498See also the fprintf() manual page, 9499@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php}, 9500@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}. 9501 9502Although format strings with positions that reorder arguments, such as 9503 9504@example 9505"Only %2$d bytes free on '%1$s'." 9506@end example 9507 9508@noindent 9509which is semantically equivalent to 9510 9511@example 9512"'%s' has only %d bytes free." 9513@end example 9514 9515@noindent 9516are a POSIX/XSI feature and not specified by ISO C 99, translators can rely 9517on this reordering ability: On the few platforms where @code{printf()}, 9518@code{fprintf()} etc. don't support this feature natively, @file{libintl.a} 9519or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>} 9520activates these replacement functions automatically. 9521 9522@cindex outdigits 9523@cindex Arabic digits 9524As a special feature for Farsi (Persian) and maybe Arabic, translators can 9525insert an @samp{I} flag into numeric format directives. For example, the 9526translation of @code{"%d"} can be @code{"%Id"}. The effect of this flag, 9527on systems with GNU @code{libc}, is that in the output, the ASCII digits are 9528replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale 9529category. On other systems, the @code{gettext} function removes this flag, 9530so that it has no effect. 9531 9532Note that the programmer should @emph{not} put this flag into the 9533untranslated string. (Putting the @samp{I} format directive flag into an 9534@var{msgid} string would lead to undefined behaviour on platforms without 9535glibc when NLS is disabled.) 9536 9537@node objc-format 9538@subsection Objective C Format Strings 9539 9540Objective C format strings are like C format strings. They support an 9541additional format directive: "%@@", which when executed consumes an argument 9542of type @code{Object *}. 9543 9544@node python-format 9545@subsection Python Format Strings 9546 9547There are two kinds of format strings in Python: those acceptable to 9548the Python built-in format operator @code{%}, labelled as 9549@samp{python-format}, and those acceptable to the @code{format} method 9550of the @samp{str} object. 9551 9552Python @code{%} format strings are described in 9553@w{Python Library reference} / 9554@w{5. Built-in Types} / 9555@w{5.6. Sequence Types} / 9556@w{5.6.2. String Formatting Operations}. 9557@uref{https://docs.python.org/2/library/stdtypes.html#string-formatting-operations}. 9558 9559Python brace format strings are described in @w{PEP 3101 -- Advanced 9560String Formatting}, @uref{https://www.python.org/dev/peps/pep-3101/}. 9561 9562@node java-format 9563@subsection Java Format Strings 9564 9565There are two kinds of format strings in Java: those acceptable to the 9566@code{MessageFormat.format} function, labelled as @samp{java-format}, 9567and those acceptable to the @code{String.format} and 9568@code{PrintStream.printf} functions, labelled as @samp{java-printf-format}. 9569 9570Java format strings are described in the JDK documentation for class 9571@code{java.text.MessageFormat}, 9572@uref{https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html}. 9573See also the ICU documentation 9574@uref{http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html}. 9575 9576Java @code{printf} format strings are described in the JDK documentation 9577for class @code{java.util.Formatter}, 9578@uref{https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html}. 9579 9580@node csharp-format 9581@subsection C# Format Strings 9582 9583C# format strings are described in the .NET documentation for class 9584@code{System.String} and in 9585@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}. 9586 9587@node javascript-format 9588@subsection JavaScript Format Strings 9589 9590Although JavaScript specification itself does not define any format 9591strings, many JavaScript implementations provide printf-like 9592functions. @code{xgettext} understands a set of common format strings 9593used in popular JavaScript implementations including Gjs, Seed, and 9594Node.JS. In such a format string, a directive starts with @samp{%} 9595and is finished by a specifier: @samp{%} denotes a literal percent 9596sign, @samp{c} denotes a character, @samp{s} denotes a string, 9597@samp{b}, @samp{d}, @samp{o}, @samp{x}, @samp{X} denote an integer, 9598@samp{f} denotes floating-point number, @samp{j} denotes a JSON 9599object. 9600 9601@node scheme-format 9602@subsection Scheme Format Strings 9603 9604Scheme format strings are documented in the SLIB manual, section 9605@w{Format Specification}. 9606 9607@node lisp-format 9608@subsection Lisp Format Strings 9609 9610Lisp format strings are described in the Common Lisp HyperSpec, 9611chapter 22.3 @w{Formatted Output}, 9612@uref{http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_22-3.html}. 9613 9614@node elisp-format 9615@subsection Emacs Lisp Format Strings 9616 9617Emacs Lisp format strings are documented in the Emacs Lisp reference, 9618section @w{Formatting Strings}, 9619@uref{https://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}. 9620Note that as of version 21, XEmacs supports numbered argument specifications 9621in format strings while FSF Emacs doesn't. 9622 9623@node librep-format 9624@subsection librep Format Strings 9625 9626librep format strings are documented in the librep manual, section 9627@w{Formatted Output}, 9628@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output}, 9629@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}. 9630 9631@node ruby-format 9632@subsection Ruby Format Strings 9633 9634Ruby format strings are described in the documentation of the Ruby 9635functions @code{format} and @code{sprintf}, in 9636@uref{https://ruby-doc.org/core-2.7.1/Kernel.html#method-i-sprintf}. 9637 9638There are two kinds of format strings in Ruby: 9639@itemize @bullet 9640@item 9641Those that take a list of arguments without names. They support 9642argument reordering by use of the @code{%@var{n}$} syntax. Note 9643that if one argument uses this syntax, all must use this syntax. 9644@item 9645Those that take a hash table, containing named arguments. The 9646syntax is @code{%<@var{name}>}. Note that @code{%@{@var{name}@}} is 9647equivalent to @code{%<@var{name}>s}. 9648@end itemize 9649 9650@node sh-format 9651@subsection Shell Format Strings 9652 9653Shell format strings, as supported by GNU gettext and the @samp{envsubst} 9654program, are strings with references to shell variables in the form 9655@code{$@var{variable}} or @code{$@{@var{variable}@}}. References of the form 9656@code{$@{@var{variable}-@var{default}@}}, 9657@code{$@{@var{variable}:-@var{default}@}}, 9658@code{$@{@var{variable}=@var{default}@}}, 9659@code{$@{@var{variable}:=@var{default}@}}, 9660@code{$@{@var{variable}+@var{replacement}@}}, 9661@code{$@{@var{variable}:+@var{replacement}@}}, 9662@code{$@{@var{variable}?@var{ignored}@}}, 9663@code{$@{@var{variable}:?@var{ignored}@}}, 9664that would be valid inside shell scripts, are not supported. The 9665@var{variable} names must consist solely of alphanumeric or underscore 9666ASCII characters, not start with a digit and be nonempty; otherwise such 9667a variable reference is ignored. 9668 9669@node awk-format 9670@subsection awk Format Strings 9671 9672awk format strings are described in the gawk documentation, section 9673@w{Printf}, 9674@uref{https://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}. 9675 9676@node lua-format 9677@subsection Lua Format Strings 9678 9679Lua format strings are described in the Lua reference manual, section @w{String Manipulation}, 9680@uref{https://www.lua.org/manual/5.1/manual.html#pdf-string.format}. 9681 9682@node object-pascal-format 9683@subsection Object Pascal Format Strings 9684 9685Object Pascal format strings are described in the documentation of the 9686Free Pascal runtime library, section Format, 9687@uref{https://www.freepascal.org/docs-html/rtl/sysutils/format.html}. 9688 9689@node smalltalk-format 9690@subsection Smalltalk Format Strings 9691 9692Smalltalk format strings are described in the GNU Smalltalk documentation, 9693class @code{CharArray}, methods @samp{bindWith:} and 9694@samp{bindWithArguments:}. 9695@uref{https://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}. 9696In summary, a directive starts with @samp{%} and is followed by @samp{%} 9697or a nonzero digit (@samp{1} to @samp{9}). 9698 9699@node qt-format 9700@subsection Qt Format Strings 9701 9702Qt format strings are described in the documentation of the QString class 9703@uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}. 9704In summary, a directive consists of a @samp{%} followed by a digit. The same 9705directive cannot occur more than once in a format string. 9706 9707@node qt-plural-format 9708@subsection Qt Format Strings 9709 9710Qt format strings are described in the documentation of the QObject::tr method 9711@uref{file:/usr/lib/qt-4.3.0/doc/html/qobject.html}. 9712In summary, the only allowed directive is @samp{%n}. 9713 9714@node kde-format 9715@subsection KDE Format Strings 9716 9717KDE 4 format strings are defined as follows: 9718A directive consists of a @samp{%} followed by a non-zero decimal number. 9719If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)} 9720must occur as well, except possibly one of them. 9721 9722@node kde-kuit-format 9723@subsection KUIT Format Strings 9724 9725KUIT (KDE User Interface Text) is compatible with KDE 4 format strings, 9726while it also allows programmers to add semantic information to a format 9727string, through XML markup tags. For example, if the first format 9728directive in a string is a filename, programmers could indicate that 9729with a @samp{filename} tag, like @samp{<filename>%1</filename>}. 9730 9731KUIT format strings are described in 9732@uref{https://api.kde.org/frameworks/ki18n/html/prg_guide.html#kuit_markup}. 9733 9734@node boost-format 9735@subsection Boost Format Strings 9736 9737Boost format strings are described in the documentation of the 9738@code{boost::format} class, at 9739@uref{https://www.boost.org/libs/format/doc/format.html}. 9740In summary, a directive has either the same syntax as in a C format string, 9741such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as 9742@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number 9743between percent signs, such as @samp{%1%}. 9744 9745@node tcl-format 9746@subsection Tcl Format Strings 9747 9748Tcl format strings are described in the @file{format.n} manual page, 9749@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}. 9750 9751@node perl-format 9752@subsection Perl Format Strings 9753 9754There are two kinds of format strings in Perl: those acceptable to the 9755Perl built-in function @code{printf}, labelled as @samp{perl-format}, 9756and those acceptable to the @code{libintl-perl} function @code{__x}, 9757labelled as @samp{perl-brace-format}. 9758 9759Perl @code{printf} format strings are described in the @code{sprintf} 9760section of @samp{man perlfunc}. 9761 9762Perl brace format strings are described in the 9763@file{Locale::TextDomain(3pm)} manual page of the CPAN package 9764libintl-perl. In brief, Perl format uses placeholders put between 9765braces (@samp{@{} and @samp{@}}). The placeholder must have the syntax 9766of simple identifiers. 9767 9768@node php-format 9769@subsection PHP Format Strings 9770 9771PHP format strings are described in the documentation of the PHP function 9772@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or 9773@uref{http://www.php.net/manual/en/function.sprintf.php}. 9774 9775@node gcc-internal-format 9776@subsection GCC internal Format Strings 9777 9778These format strings are used inside the GCC sources. In such a format 9779string, a directive starts with @samp{%}, is optionally followed by a 9780size specifier @samp{l}, an optional flag @samp{+}, another optional flag 9781@samp{#}, and is finished by a specifier: @samp{%} denotes a literal 9782percent sign, @samp{c} denotes a character, @samp{s} denotes a string, 9783@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x} 9784denote an unsigned integer, @samp{.*s} denotes a string preceded by a 9785width specification, @samp{H} denotes a @samp{location_t *} pointer, 9786@samp{D} denotes a general declaration, @samp{F} denotes a function 9787declaration, @samp{T} denotes a type, @samp{A} denotes a function argument, 9788@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L} 9789denotes a programming language, @samp{O} denotes a binary operator, 9790@samp{P} denotes a function parameter, @samp{Q} denotes an assignment 9791operator, @samp{V} denotes a const/volatile qualifier. 9792 9793@node gfc-internal-format 9794@subsection GFC internal Format Strings 9795 9796These format strings are used inside the GNU Fortran Compiler sources, 9797that is, the Fortran frontend in the GCC sources. In such a format 9798string, a directive starts with @samp{%} and is finished by a 9799specifier: @samp{%} denotes a literal percent sign, @samp{C} denotes the 9800current source location, @samp{L} denotes a source location, @samp{c} 9801denotes a character, @samp{s} denotes a string, @samp{i} and @samp{d} 9802denote an integer, @samp{u} denotes an unsigned integer. @samp{i}, 9803@samp{d}, and @samp{u} may be preceded by a size specifier @samp{l}. 9804 9805@node ycp-format 9806@subsection YCP Format Strings 9807 9808YCP sformat strings are described in the libycp documentation 9809@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}. 9810In summary, a directive starts with @samp{%} and is followed by @samp{%} 9811or a nonzero digit (@samp{1} to @samp{9}). 9812 9813 9814@node Maintainers for other Languages 9815@section The Maintainer's View 9816 9817For the maintainer, the general procedure differs from the C language 9818case: 9819 9820@itemize @bullet 9821@item 9822If only a single programming language is used, the @code{XGETTEXT_OPTIONS} 9823variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to 9824match the @code{xgettext} options for that particular programming language. 9825If the package uses more than one programming language with @code{gettext} 9826support, it becomes necessary to change the POT file construction rule 9827in @file{po/Makefile.in.in}. It is recommended to make one @code{xgettext} 9828invocation per programming language, each with the options appropriate for 9829that language, and to combine the resulting files using @code{msgcat}. 9830@end itemize 9831 9832@node List of Programming Languages 9833@section Individual Programming Languages 9834 9835@c Here is a list of programming languages, as used for Free Software projects 9836@c on SourceForge/Freshmeat, as of February 2002. Those supported by gettext 9837@c are marked with a star. 9838@c C 3580 * 9839@c Perl 1911 * 9840@c C++ 1379 * 9841@c Java 1200 * 9842@c PHP 1051 * 9843@c Python 613 * 9844@c Unix Shell 357 * 9845@c Tcl 266 * 9846@c SQL 174 9847@c JavaScript 118 9848@c Assembly 108 9849@c Scheme 51 9850@c Ruby 47 9851@c Lisp 45 * 9852@c Objective C 39 * 9853@c PL/SQL 29 9854@c Fortran 25 9855@c Ada 24 9856@c Delphi 22 9857@c Awk 19 * 9858@c Pascal 19 9859@c ML 19 9860@c Eiffel 17 9861@c Emacs-Lisp 14 * 9862@c Zope 14 9863@c ASP 12 9864@c Forth 12 9865@c Cold Fusion 10 9866@c Haskell 9 9867@c Visual Basic 9 9868@c C# 6 * 9869@c Smalltalk 6 * 9870@c Basic 5 9871@c Erlang 5 9872@c Modula 5 9873@c Object Pascal 5 * 9874@c Rexx 5 9875@c Dylan 4 9876@c Prolog 4 9877@c APL 3 9878@c PROGRESS 2 9879@c Euler 1 9880@c Euphoria 1 9881@c Pliant 1 9882@c Simula 1 9883@c XBasic 1 9884@c Logo 0 9885@c Other Scripting Engines 49 9886@c Other 116 9887 9888@menu 9889* C:: C, C++, Objective C 9890* Python:: Python 9891* Java:: Java 9892* C#:: C# 9893* JavaScript:: JavaScript 9894* Scheme:: GNU guile - Scheme 9895* Common Lisp:: GNU clisp - Common Lisp 9896* clisp C:: GNU clisp C sources 9897* Emacs Lisp:: Emacs Lisp 9898* librep:: librep 9899* Ruby:: Ruby 9900* sh:: sh - Shell Script 9901* bash:: bash - Bourne-Again Shell Script 9902* gawk:: GNU awk 9903* Lua:: Lua 9904* Pascal:: Pascal - Free Pascal Compiler 9905* Smalltalk:: GNU Smalltalk 9906* Vala:: Vala 9907* wxWidgets:: wxWidgets library 9908* Tcl:: Tcl - Tk's scripting language 9909* Perl:: Perl 9910* PHP:: PHP Hypertext Preprocessor 9911* Pike:: Pike 9912* GCC-source:: GNU Compiler Collection sources 9913* YCP:: YCP - YaST2 scripting language 9914@end menu 9915 9916@include lang-c.texi 9917@include lang-python.texi 9918@include lang-java.texi 9919@include lang-csharp.texi 9920@include lang-javascript.texi 9921@include lang-scheme.texi 9922@include lang-lisp.texi 9923@include lang-clisp-c.texi 9924@include lang-elisp.texi 9925@include lang-librep.texi 9926@include lang-ruby.texi 9927@include lang-sh.texi 9928@include lang-bash.texi 9929@include lang-gawk.texi 9930@include lang-lua.texi 9931@include lang-pascal.texi 9932@include lang-smalltalk.texi 9933@include lang-vala.texi 9934@include lang-wxwidgets.texi 9935@include lang-tcl.texi 9936@include lang-perl.texi 9937@include lang-php.texi 9938@include lang-pike.texi 9939@include lang-gcc-source.texi 9940@include lang-ycp.texi 9941 9942@c This is the template for new languages. 9943@ignore 9944 9945@ node 9946@ subsection 9947 9948@table @asis 9949@item RPMs 9950 9951@item Ubuntu packages 9952 9953@item File extension 9954 9955@item String syntax 9956 9957@item gettext shorthand 9958 9959@item gettext/ngettext functions 9960 9961@item textdomain 9962 9963@item bindtextdomain 9964 9965@item setlocale 9966 9967@item Prerequisite 9968 9969@item Use or emulate GNU gettext 9970 9971@item Extractor 9972 9973@item Formatting with positions 9974 9975@item Portability 9976 9977@item po-mode marking 9978@end table 9979 9980@end ignore 9981 9982@node Data Formats 9983@chapter Other Data Formats 9984 9985While the GNU gettext tools deal mainly with POT and PO files, they can 9986also manipulate a couple of other data formats. 9987 9988@menu 9989* Internationalizable Data:: Internationalizable Data Formats 9990* Localized Data:: Localized Data Formats 9991@end menu 9992 9993@node Internationalizable Data 9994@section Internationalizable Data Formats 9995 9996Here is a list of other data formats which can be internationalized 9997using GNU gettext. 9998 9999@menu 10000* POT:: POT - Portable Object Template 10001* RST:: Resource String Table 10002* Glade:: Glade - GNOME user interface description 10003* GSettings:: GSettings - GNOME user configuration schema 10004* AppData:: AppData - freedesktop.org application description 10005* Preparing ITS Rules:: Preparing Rules for XML Internationalization 10006@end menu 10007 10008@node POT 10009@subsection POT - Portable Object Template 10010 10011@table @asis 10012@item RPMs 10013gettext 10014 10015@item Ubuntu packages 10016gettext 10017 10018@item File extension 10019@code{pot}, @code{po} 10020 10021@item Extractor 10022@code{xgettext} 10023@end table 10024 10025@node RST 10026@subsection Resource String Table 10027@cindex RST 10028@cindex RSJ 10029 10030RST is the format of resource string table files of the Free Pascal compiler 10031versions older than 3.0.0. RSJ is the new format of resource string table 10032files, created by the Free Pascal compiler version 3.0.0 or newer. 10033 10034@table @asis 10035@item RPMs 10036fpk 10037 10038@item Ubuntu packages 10039fp-compiler 10040 10041@item File extension 10042@code{rst}, @code{rsj} 10043 10044@item Extractor 10045@code{xgettext}, @code{rstconv} 10046@end table 10047 10048@node Glade 10049@subsection Glade - GNOME user interface description 10050 10051@table @asis 10052@item RPMs 10053glade, libglade, glade2, libglade2, intltool 10054 10055@item Ubuntu packages 10056glade, libglade2-dev, intltool 10057 10058@item File extension 10059@code{glade}, @code{glade2}, @code{ui} 10060 10061@item Extractor 10062@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract} 10063@end table 10064 10065@node GSettings 10066@subsection GSettings - GNOME user configuration schema 10067 10068@table @asis 10069@item RPMs 10070glib2 10071 10072@item Ubuntu packages 10073libglib2.0-dev 10074 10075@item File extension 10076@code{gschema.xml} 10077 10078@item Extractor 10079@code{xgettext}, @code{intltool-extract} 10080@end table 10081 10082@node AppData 10083@subsection AppData - freedesktop.org application description 10084 10085This file format is specified in 10086@url{https://www.freedesktop.org/software/appstream/docs/}. 10087 10088@table @asis 10089@item RPMs 10090appdata-tools, appstream, libappstream-glib, libappstream-glib-builder 10091 10092@item Ubuntu packages 10093appdata-tools, appstream, libappstream-glib-dev 10094 10095@item File extension 10096@code{appdata.xml}, @code{metainfo.xml} 10097 10098@item Extractor 10099@code{xgettext}, @code{intltool-extract}, @code{itstool} 10100@end table 10101 10102@node Preparing ITS Rules 10103@subsection Preparing Rules for XML Internationalization 10104@cindex preparing rules for XML translation 10105 10106Marking translatable strings in an XML file is done through a separate 10107"rule" file, making use of the Internationalization Tag Set standard 10108(ITS, @uref{https://www.w3.org/TR/its20/}). The currently supported ITS 10109data categories are: @samp{Translate}, @samp{Localization Note}, 10110@samp{Elements Within Text}, and @samp{Preserve Space}. In addition to 10111them, @code{xgettext} also recognizes the following extended data 10112categories: 10113 10114@table @samp 10115@item Context 10116 10117This data category associates @code{msgctxt} to the extracted text. In 10118the global rule, the @code{contextRule} element contains the following: 10119 10120@itemize 10121@item 10122A required @code{selector} attribute. It contains an absolute selector 10123that selects the nodes to which this rule applies. 10124 10125@item 10126A required @code{contextPointer} attribute that contains a relative 10127selector pointing to a node that holds the @code{msgctxt} value. 10128 10129@item 10130An optional @code{textPointer} attribute that contains a relative 10131selector pointing to a node that holds the @code{msgid} value. 10132@end itemize 10133 10134@item Escape Special Characters 10135 10136This data category indicates whether the special XML characters 10137(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity 10138reference. In the global rule, the @code{escapeRule} element contains 10139the following: 10140 10141@itemize 10142@item 10143A required @code{selector} attribute. It contains an absolute selector 10144that selects the nodes to which this rule applies. 10145 10146@item 10147A required @code{escape} attribute with the value @code{yes} or @code{no}. 10148@end itemize 10149 10150@item Extended Preserve Space 10151 10152This data category extends the standard @samp{Preserve Space} data 10153category with the additional values @samp{trim} and @samp{paragraph}. 10154@samp{trim} means to remove the leading and trailing whitespaces of the 10155content, but not to normalize whitespaces in the middle. 10156@samp{paragraph} means to normalize the content but keep the paragraph 10157boundaries. In the global 10158rule, the @code{preserveSpaceRule} element contains the following: 10159 10160@itemize 10161@item 10162A required @code{selector} attribute. It contains an absolute selector 10163that selects the nodes to which this rule applies. 10164 10165@item 10166A required @code{space} attribute with the value @code{default}, 10167@code{preserve}, @code{trim}, or @code{paragraph}. 10168@end itemize 10169 10170@end table 10171 10172All those extended data categories can only be expressed with global 10173rules, and the rule elements have to have the 10174@code{https://www.gnu.org/s/gettext/ns/its/extensions/1.0} namespace. 10175 10176Given the following XML document in a file @file{messages.xml}: 10177 10178@example 10179<?xml version="1.0"?> 10180<messages> 10181 <message> 10182 <p>A translatable string</p> 10183 </message> 10184 <message> 10185 <p translatable="no">A non-translatable string</p> 10186 </message> 10187</messages> 10188@end example 10189 10190To extract the first text content ("A translatable string"), but not the 10191second ("A non-translatable string"), the following ITS rules can be used: 10192 10193@example 10194<?xml version="1.0"?> 10195<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> 10196 <its:translateRule selector="/messages" translate="no"/> 10197 <its:translateRule selector="//message/p" translate="yes"/> 10198 10199 <!-- If 'p' has an attribute 'translatable' with the value 'no', then 10200 the content is not translatable. --> 10201 <its:translateRule selector="//message/p[@@translatable = 'no']" 10202 translate="no"/> 10203</its:rules> 10204@end example 10205 10206@samp{xgettext} needs another file called "locating rule" to associate 10207an ITS rule with an XML file. If the above ITS file is saved as 10208@file{messages.its}, the locating rule would look like: 10209 10210@example 10211<?xml version="1.0"?> 10212<locatingRules> 10213 <locatingRule name="Messages" pattern="*.xml"> 10214 <documentRule localName="messages" target="messages.its"/> 10215 </locatingRule> 10216 <locatingRule name="Messages" pattern="*.msg" target="messages.its"/> 10217</locatingRules> 10218@end example 10219 10220The @code{locatingRule} element must have a @code{pattern} attribute, 10221which denotes either a literal file name or a wildcard pattern of the 10222XML file@footnote{Note that the file name matching is done after 10223removing any @code{.in} suffix from the input file name. Thus the 10224@code{pattern} attribute must not include a pattern matching @code{.in}. 10225For example, if the input file name is @file{foo.msg.in}, the pattern 10226should be either @code{*.msg} or just @code{*}, rather than 10227@code{*.in}.}. The @code{locatingRule} element can have child 10228@code{documentRule} element, which adds checks on the content of the XML 10229file. 10230 10231The first rule matches any file with the @file{.xml} file extension, but 10232it only applies to XML files whose root element is @samp{<messages>}. 10233 10234The second rule indicates that the same ITS rule file are also 10235applicable to any file with the @file{.msg} file extension. The 10236optional @code{name} attribute of @code{locatingRule} allows to choose 10237rules by name, typically with @code{xgettext}'s @code{-L} option. 10238 10239The associated ITS rule file is indicated by the @code{target} attribute 10240of @code{locatingRule} or @code{documentRule}. If it is specified in a 10241@code{documentRule} element, the parent @code{locatingRule} shouldn't 10242have the @code{target} attribute. 10243 10244Locating rule files must have the @file{.loc} file extension. Both ITS 10245rule files and locating rule files must be installed in the 10246@file{$prefix/share/gettext/its} directory. Once those files are 10247properly installed, @code{xgettext} can extract translatable strings 10248from the matching XML files. 10249 10250@subsubsection Two Use-cases of Translated Strings in XML 10251 10252For XML, there are two use-cases of translated strings. One is the case 10253where the translated strings are directly consumed by programs, and the 10254other is the case where the translated strings are merged back to the 10255original XML document. In the former case, special characters in the 10256extracted strings shouldn't be escaped, while they should in the latter 10257case. To control wheter to escape special characters, the @samp{Escape 10258Special Characters} data category can be used. 10259 10260To merge the translations, the @samp{msgfmt} program can be used with 10261the option @code{--xml}. @xref{msgfmt Invocation}, for more details 10262about how one calls the @samp{msgfmt} program. @samp{msgfmt}'s 10263@code{--xml} option doesn't perform character escaping, so translated 10264strings can have arbitrary XML constructs, such as elements for markup. 10265 10266@c This is the template for new data formats. 10267@ignore 10268 10269@ node 10270@ subsection 10271 10272@table @asis 10273@item RPMs 10274 10275@item Ubuntu packages 10276 10277@item File extension 10278 10279@item Extractor 10280@end table 10281 10282@end ignore 10283 10284@node Localized Data 10285@section Localized Data Formats 10286 10287Here is a list of file formats that contain localized data and that the 10288GNU gettext tools can manipulate. 10289 10290@menu 10291* Editable Message Catalogs:: Editable Message Catalogs 10292* Compiled Message Catalogs:: Compiled Message Catalogs 10293* Desktop Entry:: Desktop Entry files 10294* XML:: XML files 10295@end menu 10296 10297@node Editable Message Catalogs 10298@subsection Editable Message Catalogs 10299 10300These file formats can be used with all of the @code{msg*} tools and with 10301the @code{xgettext} program. 10302 10303If you just want to convert among these formats, you can use the 10304@code{msgcat} program (with the appropriate option) or the @code{xgettext} 10305program. 10306 10307@menu 10308* PO:: PO - Portable Object 10309* Java .properties:: Java .properties 10310* GNUstep .strings:: NeXTstep/GNUstep .strings 10311@end menu 10312 10313@node PO 10314@subsubsection PO - Portable Object 10315 10316@table @asis 10317@item File extension 10318@code{po} 10319@end table 10320 10321@node Java .properties 10322@subsubsection Java .properties 10323 10324@table @asis 10325@item File extension 10326@code{properties} 10327@end table 10328 10329@node GNUstep .strings 10330@subsubsection NeXTstep/GNUstep .strings 10331 10332@table @asis 10333@item File extension 10334@code{strings} 10335@end table 10336 10337@node Compiled Message Catalogs 10338@subsection Compiled Message Catalogs 10339 10340These file formats can be created through @code{msgfmt} and converted back 10341to PO format through @code{msgunfmt}. 10342 10343@menu 10344* MO:: MO - Machine Object 10345* Java ResourceBundle:: Java ResourceBundle 10346* C# Satellite Assembly:: C# Satellite Assembly 10347* C# Resource:: C# Resource 10348* Tcl message catalog:: Tcl message catalog 10349* Qt message catalog:: Qt message catalog 10350@end menu 10351 10352@node MO 10353@subsubsection MO - Machine Object 10354 10355@table @asis 10356@item File extension 10357@code{mo} 10358@end table 10359 10360See section @ref{MO Files} for details. 10361 10362@node Java ResourceBundle 10363@subsubsection Java ResourceBundle 10364 10365@table @asis 10366@item File extension 10367@code{class} 10368@end table 10369 10370For more information, see the section @ref{Java} and the examples 10371@code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}. 10372 10373@node C# Satellite Assembly 10374@subsubsection C# Satellite Assembly 10375 10376@table @asis 10377@item File extension 10378@code{dll} 10379@end table 10380 10381For more information, see the section @ref{C#}. 10382 10383@node C# Resource 10384@subsubsection C# Resource 10385 10386@table @asis 10387@item File extension 10388@code{resources} 10389@end table 10390 10391For more information, see the section @ref{C#}. 10392 10393@node Tcl message catalog 10394@subsubsection Tcl message catalog 10395 10396@table @asis 10397@item File extension 10398@code{msg} 10399@end table 10400 10401For more information, see the section @ref{Tcl} and the examples 10402@code{hello-tcl}, @code{hello-tcl-tk}. 10403 10404@node Qt message catalog 10405@subsubsection Qt message catalog 10406 10407@table @asis 10408@item File extension 10409@code{qm} 10410@end table 10411 10412For more information, see the examples @code{hello-c++-qt} and 10413@code{hello-c++-kde}. 10414 10415@node Desktop Entry 10416@subsection Desktop Entry files 10417 10418The programmer produces a desktop entry file template with only the 10419English strings. These strings get included in the POT file, by way of 10420@code{xgettext} (usually by listing the template in @code{po/POTFILES.in}). 10421The translators produce PO files, one for each language. Finally, an 10422@code{msgfmt --desktop} invocation collects all the translations in the 10423desktop entry file. 10424 10425For more information, see the example @code{hello-c-gnome3}. 10426 10427@menu 10428* Icons:: Handling icons 10429@end menu 10430 10431@node Icons 10432@subsubsection How to handle icons in Desktop Entry files 10433 10434Icons are generally locale dependent, for the following reasons: 10435 10436@itemize @bullet 10437@item 10438Icons may contain signs that are considered rude in some cultures. For 10439example, the high-five sign, in some cultures, is perceived as an 10440unfriendly ``stop'' sign. 10441@item 10442Icons may contain metaphors that are culture specific. For example, a 10443mailbox in the U.S. looks different than mailboxes all around the world. 10444@item 10445Icons may need to be mirrored for right-to-left locales. 10446@item 10447Icons may contain text strings (a bad practice, but anyway). 10448@end itemize 10449 10450However, icons are not covered by GNU gettext localization, because 10451@itemize @bullet 10452@item 10453Icons cannot be easily embedded in PO files, 10454@item 10455The need to localize an icon is rare, and the ability to do so in a PO 10456file would introduce translator mistakes. 10457@c https://lists.freedesktop.org/archives/xdg/2019-June/014168.html 10458@end itemize 10459 10460Desktop Entry files may contain an @samp{Icon} property, and this 10461property is localizable. If a translator wishes to localize an icon, 10462she should do so by bypassing the normal workflow with PO files: 10463@enumerate 10464@item 10465The translator contacts the package developers directly, sending them 10466the icon appropriate for her locale, with a request to change the 10467template file. 10468@item 10469The package developers add the icon file to their repository, and a 10470line 10471@smallexample 10472Icon[@var{locale}]=@var{icon_file_name} 10473@end smallexample 10474@noindent 10475to the template file. 10476@end enumerate 10477@noindent 10478This line remains in place when this template file is merged with the 10479translators' PO files, through @code{msgfmt}. 10480 10481@node XML 10482@subsection XML files 10483 10484See the section @ref{Preparing ITS Rules} and 10485@ref{msgfmt Invocation}, subsection ``XML mode operations''. 10486 10487@node Conclusion 10488@chapter Concluding Remarks 10489 10490We would like to conclude this GNU @code{gettext} manual by presenting 10491an history of the Translation Project so far. We finally give 10492a few pointers for those who want to do further research or readings 10493about Native Language Support matters. 10494 10495@menu 10496* History:: History of GNU @code{gettext} 10497* The original ABOUT-NLS:: Historical introduction 10498* References:: Related Readings 10499@end menu 10500 10501@node History 10502@section History of GNU @code{gettext} 10503@cindex history of GNU @code{gettext} 10504 10505Internationalization concerns and algorithms have been informally 10506and casually discussed for years in GNU, sometimes around GNU 10507@code{libc}, maybe around the incoming @code{Hurd}, or otherwise 10508(nobody clearly remembers). And even then, when the work started for 10509real, this was somewhat independently of these previous discussions. 10510 10511This all began in July 1994, when Patrick D'Cruze had the idea and 10512initiative of internationalizing version 3.9.2 of GNU @code{fileutils}. 10513He then asked Jim Meyering, the maintainer, how to get those changes 10514folded into an official release. That first draft was full of 10515@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find 10516nicer ways. Patrick and Jim shared some tries and experimentations 10517in this area. Then, feeling that this might eventually have a deeper 10518impact on GNU, Jim wanted to know what standards were, and contacted 10519Richard Stallman, who very quickly and verbally described an overall 10520design for what was meant to become @code{glocale}, at that time. 10521 10522Jim implemented @code{glocale} and got a lot of exhausting feedback 10523from Patrick and Richard, of course, but also from Mitchum DSouza 10524(who wrote a @code{catgets}-like package), Roland McGrath, maybe David 10525MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and 10526pulling in various directions, not always compatible, to the extent 10527that after a couple of test releases, @code{glocale} was torn apart. 10528In particular, Paul Eggert -- always keeping an eye on developments 10529in Solaris -- advocated the use of the @code{gettext} API over 10530@code{glocale}'s @code{catgets}-based API. 10531 10532While Jim took some distance and time and became dad for a second 10533time, Roland wanted to get GNU @code{libc} internationalized, and 10534got Ulrich Drepper involved in that project. Instead of starting 10535from @code{glocale}, Ulrich rewrote something from scratch, but 10536more conforming to the set of guidelines who emerged out of the 10537@code{glocale} effort. Then, Ulrich got people from the previous 10538forum to involve themselves into this new project, and the switch 10539from @code{glocale} to what was first named @code{msgutils}, renamed 10540@code{nlsutils}, and later @code{gettext}, became officially accepted 10541by Richard in May 1995 or so. 10542 10543Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext} 10544in April 1995. The first official release of the package, including 10545PO mode, occurred in July 1995, and was numbered 0.7. Other people 10546contributed to the effort by providing a discussion forum around 10547Ulrich, writing little pieces of code, or testing. These are quoted 10548in the @code{THANKS} file which comes with the GNU @code{gettext} 10549distribution. 10550 10551While this was being done, Fran@,{c}ois adapted half a dozen of 10552GNU packages to @code{glocale} first, then later to @code{gettext}, 10553putting them in pretest, so providing along the way an effective 10554user environment for fine tuning the evolving tools. He also took 10555the responsibility of organizing and coordinating the Translation 10556Project. After nearly a year of informal exchanges between people from 10557many countries, translator teams started to exist in May 1995, through 10558the creation and support by Patrick D'Cruze of twenty unmoderated 10559mailing lists for that many native languages, and two moderated 10560lists: one for reaching all teams at once, the other for reaching 10561all willing maintainers of internationalized free software packages. 10562 10563Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration 10564of Greg McGary, as a kind of contribution to Ulrich's package. 10565He also gave a hand with the GNU @code{gettext} Texinfo manual. 10566 10567In 1997, Ulrich Drepper released the GNU libc 2.0, which included the 10568@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions. 10569 10570In 2000, Ulrich Drepper added plural form handling (the @code{ngettext} 10571function) to GNU libc. Later, in 2001, he released GNU libc 2.2.x, 10572which is the first free C library with full internationalization support. 10573 10574Ulrich being quite busy in his role of General Maintainer of GNU libc, 10575he handed over the GNU @code{gettext} maintenance to Bruno Haible in 105762000. Bruno added the plural form handling to the tools as well, added 10577support for UTF-8 and CJK locales, and wrote a few new tools for 10578manipulating PO files. 10579 10580@include nls.texi 10581 10582@node References 10583@section Related Readings 10584@cindex related reading 10585@cindex bibliography 10586 10587@strong{ NOTE: } This documentation section is outdated and needs to be 10588revised. 10589 10590Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting 10591bibliography on internationalization matters, called 10592@cite{Internationalization Reference List}, which is available as: 10593@example 10594ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt 10595@end example 10596 10597Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a 10598Frequently Asked Questions (FAQ) list, entitled @cite{Programming for 10599Internationalisation}. This FAQ discusses writing programs which 10600can handle different language conventions, character sets, etc.; 10601and is applicable to all character set encodings, with particular 10602emphasis on @w{ISO 8859-1}. It is regularly published in Usenet 10603groups @file{comp.unix.questions}, @file{comp.std.internat}, 10604@file{comp.software.international}, @file{comp.lang.c}, 10605@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers} 10606and @file{news.answers}. The home location of this document is: 10607@example 10608ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming 10609@end example 10610 10611Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS 10612matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took 10613over the responsibility of maintaining it. It may be found as: 10614@example 10615ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/... 10616 ...locale-tutorial-0.8.txt.gz 10617@end example 10618@noindent 10619This site is mirrored in: 10620@example 10621ftp://ftp.ibp.fr/pub/linux/sunsite/ 10622@end example 10623 10624A French version of the same tutorial should be findable at: 10625@example 10626ftp://ftp.ibp.fr/pub/linux/french/docs/ 10627@end example 10628@noindent 10629together with French translations of many Linux-related documents. 10630 10631@node Language Codes 10632@appendix Language Codes 10633@cindex language codes 10634@cindex ISO 639 10635 10636The @w{ISO 639} standard defines two-letter codes for many languages, and 10637three-letter codes for more rarely used languages. 10638All abbreviations for languages used in the Translation Project should 10639come from this standard. 10640 10641@menu 10642* Usual Language Codes:: Two-letter ISO 639 language codes 10643* Rare Language Codes:: Three-letter ISO 639 language codes 10644@end menu 10645 10646@node Usual Language Codes 10647@appendixsec Usual Language Codes 10648 10649For the commonly used languages, the @w{ISO 639-1} standard defines two-letter 10650codes. 10651 10652@table @samp 10653@include iso-639.texi 10654@end table 10655 10656@node Rare Language Codes 10657@appendixsec Rare Language Codes 10658 10659For rarely used languages, the @w{ISO 639-2} standard defines three-letter 10660codes. Here is the current list, reduced to only living languages with at least 10661one million of speakers. 10662 10663@table @samp 10664@include iso-639-2.texi 10665@end table 10666 10667@node Country Codes 10668@appendix Country Codes 10669@cindex country codes 10670@cindex ISO 3166 10671 10672The @w{ISO 3166} standard defines two character codes for many countries 10673and territories. All abbreviations for countries used in the Translation 10674Project should come from this standard. 10675 10676@table @samp 10677@include iso-3166.texi 10678@end table 10679 10680@node Licenses 10681@appendix Licenses 10682@cindex Licenses 10683 10684The files of this package are covered by the licenses indicated in each 10685particular file or directory. Here is a summary: 10686 10687@itemize @bullet 10688@item 10689The @code{libintl} and @code{libasprintf} libraries are covered by the 10690GNU Lesser General Public License (LGPL). 10691A copy of the license is included in @ref{GNU LGPL}. 10692 10693@item 10694The executable programs of this package and the @code{libgettextpo} library 10695are covered by the GNU General Public License (GPL). 10696A copy of the license is included in @ref{GNU GPL}. 10697 10698@item 10699This manual is free documentation. It is dually licensed under the 10700GNU FDL and the GNU GPL. This means that you can redistribute this 10701manual under either of these two licenses, at your choice. 10702@* 10703This manual is covered by the GNU FDL. Permission is granted to copy, 10704distribute and/or modify this document under the terms of the 10705GNU Free Documentation License (FDL), either version 1.2 of the 10706License, or (at your option) any later version published by the 10707Free Software Foundation (FSF); with no Invariant Sections, with no 10708Front-Cover Text, and with no Back-Cover Texts. 10709A copy of the license is included in @ref{GNU FDL}. 10710@* 10711This manual is covered by the GNU GPL. You can redistribute it and/or 10712modify it under the terms of the GNU General Public License (GPL), either 10713version 2 of the License, or (at your option) any later version published 10714by the Free Software Foundation (FSF). 10715A copy of the license is included in @ref{GNU GPL}. 10716@end itemize 10717 10718@menu 10719* GNU GPL:: GNU General Public License 10720* GNU LGPL:: GNU Lesser General Public License 10721* GNU FDL:: GNU Free Documentation License 10722@end menu 10723 10724@page 10725@node GNU GPL 10726@appendixsec GNU GENERAL PUBLIC LICENSE 10727@cindex GPL, GNU General Public License 10728@cindex License, GNU GPL 10729@include gpl.texi 10730@page 10731@node GNU LGPL 10732@appendixsec GNU LESSER GENERAL PUBLIC LICENSE 10733@cindex LGPL, GNU Lesser General Public License 10734@cindex License, GNU LGPL 10735@include lgpl.texi 10736@page 10737@node GNU FDL 10738@appendixsec GNU Free Documentation License 10739@cindex FDL, GNU Free Documentation License 10740@cindex License, GNU FDL 10741@include fdl.texi 10742 10743@node Program Index 10744@unnumbered Program Index 10745 10746@printindex pg 10747 10748@node Option Index 10749@unnumbered Option Index 10750 10751@printindex op 10752 10753@node Variable Index 10754@unnumbered Variable Index 10755 10756@printindex vr 10757 10758@node PO Mode Index 10759@unnumbered PO Mode Index 10760 10761@printindex em 10762 10763@node Autoconf Macro Index 10764@unnumbered Autoconf Macro Index 10765 10766@printindex am 10767 10768@node Index 10769@unnumbered General Index 10770 10771@printindex cp 10772 10773@bye 10774 10775@c Local variables: 10776@c texinfo-column-for-description: 32 10777@c End: 10778