@c This file is part of the GNU gettext manual. @c Copyright (C) 1995-2020 Free Software Foundation, Inc. @c See the file gettext.texi for copying conditions. @node Perl @subsection Perl @cindex Perl @table @asis @item RPMs perl @item Ubuntu packages perl, libintl-perl @item File extension @code{pl}, @code{PL}, @code{pm}, @code{perl}, @code{cgi} @item String syntax @itemize @bullet @item @code{"abc"} @item @code{'abc'} @item @code{qq (abc)} @item @code{q (abc)} @item @code{qr /abc/} @item @code{qx (/bin/date)} @item @code{/pattern match/} @item @code{?pattern match?} @item @code{s/substitution/operators/} @item @code{$tied_hash@{"message"@}} @item @code{$tied_hash_reference->@{"message"@}} @item etc., issue the command @samp{man perlsyn} for details @end itemize @item gettext shorthand @code{__} (double underscore) @item gettext/ngettext functions @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, @code{dngettext}, @code{dcngettext}, @code{pgettext}, @code{dpgettext}, @code{dcpgettext}, @code{npgettext}, @code{dnpgettext}, @code{dcnpgettext} @item textdomain @code{textdomain} function @item bindtextdomain @code{bindtextdomain} function @item bind_textdomain_codeset @code{bind_textdomain_codeset} function @item setlocale Use @code{setlocale (LC_ALL, "");} @item Prerequisite @code{use POSIX;} @*@code{use Locale::TextDomain;} (included in the package libintl-perl which is available on the Comprehensive Perl Archive Network CPAN, https://www.cpan.org/). @item Use or emulate GNU gettext platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext @item Extractor @code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -kN__n:1,2 -k__p:1c,2 -k__np:1c,2,3 -kN__p:1c,2 -kN__np:1c,2,3} @item Formatting with positions Both kinds of format strings support formatting with positions. @*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer) @*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)} @item Portability The @code{libintl-perl} package is platform independent but is not part of the Perl core. The programmer is responsible for providing a dummy implementation of the required functions if the package is not installed on the target system. @item po-mode marking --- @item Documentation Included in @code{libintl-perl}, available on CPAN (https://www.cpan.org/). @end table An example is available in the @file{examples} directory: @code{hello-perl}. @cindex marking Perl sources The @code{xgettext} parser backend for Perl differs significantly from the parser backends for other programming languages, just as Perl itself differs significantly from other programming languages. The Perl parser backend offers many more string marking facilities than the other backends but it also has some Perl specific limitations, the worst probably being its imperfectness. @menu * General Problems:: General Problems Parsing Perl Code * Default Keywords:: Which Keywords Will xgettext Look For? * Special Keywords:: How to Extract Hash Keys * Quote-like Expressions:: What are Strings And Quote-like Expressions? * Interpolation I:: Invalid String Interpolation * Interpolation II:: Valid String Interpolation * Parentheses:: When To Use Parentheses * Long Lines:: How To Grok with Long Lines * Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work @end menu @node General Problems @subsubsection General Problems Parsing Perl Code It is often heard that only Perl can parse Perl. This is not true. Perl cannot be @emph{parsed} at all, it can only be @emph{executed}. Perl has various built-in ambiguities that can only be resolved at runtime. The following example may illustrate one common problem: @example print gettext "Hello World!"; @end example Although this example looks like a bullet-proof case of a function invocation, it is not: @example open gettext, ">testfile" or die; print gettext "Hello world!" @end example In this context, the string @code{gettext} looks more like a file handle. But not necessarily: @example use Locale::Messages qw (:libintl_h); open gettext ">testfile" or die; print gettext "Hello world!"; @end example Now, the file is probably syntactically incorrect, provided that the module @code{Locale::Messages} found first in the Perl include path exports a function @code{gettext}. But what if the module @code{Locale::Messages} really looks like this? @example use vars qw (*gettext); 1; @end example In this case, the string @code{gettext} will be interpreted as a file handle again, and the above example will create a file @file{testfile} and write the string ``Hello world!'' into it. Even advanced control flow analysis will not really help: @example if (0.5 < rand) @{ eval "use Sane"; @} else @{ eval "use InSane"; @} print gettext "Hello world!"; @end example If the module @code{Sane} exports a function @code{gettext} that does what we expect, and the module @code{InSane} opens a file for writing and associates the @emph{handle} @code{gettext} with this output stream, we are clueless again about what will happen at runtime. It is completely unpredictable. The truth is that Perl has so many ways to fill its symbol table at runtime that it is impossible to interpret a particular piece of code without executing it. Of course, @code{xgettext} will not execute your Perl sources while scanning for translatable strings, but rather use heuristics in order to guess what you meant. Another problem is the ambiguity of the slash and the question mark. Their interpretation depends on the context: @example # A pattern match. print "OK\n" if /foobar/; # A division. print 1 / 2; # Another pattern match. print "OK\n" if ?foobar?; # Conditional. print $x ? "foo" : "bar"; @end example The slash may either act as the division operator or introduce a pattern match, whereas the question mark may act as the ternary conditional operator or as a pattern match, too. Other programming languages like @code{awk} present similar problems, but the consequences of a misinterpretation are particularly nasty with Perl sources. In @code{awk} for instance, a statement can never exceed one line and the parser can recover from a parsing error at the next newline and interpret the rest of the input stream correctly. Perl is different, as a pattern match is terminated by the next appearance of the delimiter (the slash or the question mark) in the input stream, regardless of the semantic context. If a slash is really a division sign but mis-interpreted as a pattern match, the rest of the input file is most probably parsed incorrectly. There are certain cases, where the ambiguity cannot be resolved at all: @example $x = wantarray ? 1 : 0; @end example The Perl built-in function @code{wantarray} does not accept any arguments. The Perl parser therefore knows that the question mark does not start a regular expression but is the ternary conditional operator. @example sub wantarrays @{@} $x = wantarrays ? 1 : 0; @end example Now the situation is different. The function @code{wantarrays} takes a variable number of arguments (like any non-prototyped Perl function). The question mark is now the delimiter of a pattern match, and hence the piece of code does not compile. @example sub wantarrays() @{@} $x = wantarrays ? 1 : 0; @end example Now the function is prototyped, Perl knows that it does not accept any arguments, and the question mark is therefore interpreted as the ternaray operator again. But that unfortunately outsmarts @code{xgettext}. The Perl parser in @code{xgettext} cannot know whether a function has a prototype and what that prototype would look like. It therefore makes an educated guess. If a function is known to be a Perl built-in and this function does not accept any arguments, a following question mark or slash is treated as an operator, otherwise as the delimiter of a following regular expression. The Perl built-ins that do not accept arguments are @code{wantarray}, @code{fork}, @code{time}, @code{times}, @code{getlogin}, @code{getppid}, @code{getpwent}, @code{getgrent}, @code{gethostent}, @code{getnetent}, @code{getprotoent}, @code{getservent}, @code{setpwent}, @code{setgrent}, @code{endpwent}, @code{endgrent}, @code{endhostent}, @code{endnetent}, @code{endprotoent}, and @code{endservent}. If you find that @code{xgettext} fails to extract strings from portions of your sources, you should therefore look out for slashes and/or question marks preceding these sections. You may have come across a bug in @code{xgettext}'s Perl parser (and of course you should report that bug). In the meantime you should consider to reformulate your code in a manner less challenging to @code{xgettext}. In particular, if the parser is too dumb to see that a function does not accept arguments, use parentheses: @example $x = somefunc() ? 1 : 0; $y = (somefunc) ? 1 : 0; @end example In fact the Perl parser itself has similar problems and warns you about such constructs. @node Default Keywords @subsubsection Which keywords will xgettext look for? @cindex Perl default keywords Unless you instruct @code{xgettext} otherwise by invoking it with one of the options @code{--keyword} or @code{-k}, it will recognize the following keywords in your Perl sources: @itemize @bullet @item @code{gettext} @item @code{dgettext:2} The second argument will be extracted. @item @code{dcgettext:2} The second argument will be extracted. @item @code{ngettext:1,2} The first (singular) and the second (plural) argument will be extracted. @item @code{dngettext:2,3} The second (singular) and the third (plural) argument will be extracted. @item @code{dcngettext:2,3} The second (singular) and the third (plural) argument will be extracted. @item @code{pgettext:1c,2} The first (message context) and the second argument will be extracted. @item @code{dpgettext:2c,3} The second (message context) and the third argument will be extracted. @item @code{dcpgettext:2c,3} The second (message context) and the third argument will be extracted. @item @code{npgettext:1c,2,3} The first (message context), second (singular), and third (plural) argument will be extracted. @item @code{dnpgettext:2c,3,4} The second (message context), third (singular), and fourth (plural) argument will be extracted. @item @code{dcnpgettext:2c,3,4} The second (message context), third (singular), and fourth (plural) argument will be extracted. @item @code{gettext_noop} @item @code{%gettext} The keys of lookups into the hash @code{%gettext} will be extracted. @item @code{$gettext} The keys of lookups into the hash reference @code{$gettext} will be extracted. @end itemize @node Special Keywords @subsubsection How to Extract Hash Keys @cindex Perl special keywords for hash-lookups Translating messages at runtime is normally performed by looking up the original string in the translation database and returning the translated version. The ``natural'' Perl implementation is a hash lookup, and, of course, @code{xgettext} supports such practice. @example print __"Hello world!"; print $__@{"Hello world!"@}; print $__->@{"Hello world!"@}; print $$__@{"Hello world!"@}; @end example The above four lines all do the same thing. The Perl module @code{Locale::TextDomain} exports by default a hash @code{%__} that is tied to the function @code{__()}. It also exports a reference @code{$__} to @code{%__}. If an argument to the @code{xgettext} option @code{--keyword}, resp. @code{-k} starts with a percent sign, the rest of the keyword is interpreted as the name of a hash. If it starts with a dollar sign, the rest of the keyword is interpreted as a reference to a hash. Note that you can omit the quotation marks (single or double) around the hash key (almost) whenever Perl itself allows it: @example print $gettext@{Error@}; @end example The exact rule is: You can omit the surrounding quotes, when the hash key is a valid C (!) identifier, i.e.@: when it starts with an underscore or an ASCII letter and is followed by an arbitrary number of underscores, ASCII letters or digits. Other Unicode characters are @emph{not} allowed, regardless of the @code{use utf8} pragma. @node Quote-like Expressions @subsubsection What are Strings And Quote-like Expressions? @cindex Perl quote-like expressions Perl offers a plethora of different string constructs. Those that can be used either as arguments to functions or inside braces for hash lookups are generally supported by @code{xgettext}. @itemize @bullet @item @strong{double-quoted strings} @* @example print gettext "Hello World!"; @end example @item @strong{single-quoted strings} @* @example print gettext 'Hello World!'; @end example @item @strong{the operator qq} @* @example print gettext qq |Hello World!|; print gettext qq >; @end example The operator @code{qq} is fully supported. You can use arbitrary delimiters, including the four bracketing delimiters (round, angle, square, curly) that nest. @item @strong{the operator q} @* @example print gettext q |Hello World!|; print gettext q >; @end example The operator @code{q} is fully supported. You can use arbitrary delimiters, including the four bracketing delimiters (round, angle, square, curly) that nest. @item @strong{the operator qx} @* @example print gettext qx ;LANGUAGE=C /bin/date; print gettext qx [/usr/bin/ls | grep '^[A-Z]*']; @end example The operator @code{qx} is fully supported. You can use arbitrary delimiters, including the four bracketing delimiters (round, angle, square, curly) that nest. The example is actually a useless use of @code{gettext}. It will invoke the @code{gettext} function on the output of the command specified with the @code{qx} operator. The feature was included in order to make the interface consistent (the parser will extract all strings and quote-like expressions). @item @strong{here documents} @* @example @group print gettext <<'EOF'; program not found in $PATH EOF print ngettext <My Homepage EOF @end example The parser will extract the entire here document, and it will appear entirely in the resulting PO file, including the JavaScript snippet embedded in the HTML code. If you exaggerate with constructs like the above, you will run the risk that the translators of your package will look out for a less challenging project. You should consider an alternative expression here: @example print <$gettext@{"My Homepage"@} EOF @end example Only the translatable portions of the code will be extracted here, and the resulting PO file will begrudgingly improve in terms of readability. You can interpolate hash lookups in all strings or quote-like expressions that are subject to interpolation (see the manual page @samp{man perlop} for details). Double interpolation is invalid, however: @example # TRANSLATORS: Replace "the earth" with the name of your planet. print gettext qq@{Welcome to $gettext->@{"the earth"@}@}; @end example The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in the first place, and checked for invalid variable interpolation. The dollar sign of hash-dereferencing will therefore terminate the parser with an ``invalid interpolation'' error. It is valid to interpolate hash lookups in regular expressions: @example if ($var =~ /$gettext@{"the earth"@}/) @{ print gettext "Match!\n"; @} s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g; @end example @node Parentheses @subsubsection When To Use Parentheses @cindex Perl parentheses In Perl, parentheses around function arguments are mostly optional. @code{xgettext} will always assume that all recognized keywords (except for hashes and hash references) are names of properly prototyped functions, and will (hopefully) only require parentheses where Perl itself requires them. All constructs in the following example are therefore ok to use: @example @group print gettext ("Hello World!\n"); print gettext "Hello World!\n"; print dgettext ($package => "Hello World!\n"); print dgettext $package, "Hello World!\n"; # The "fat comma" => turns the left-hand side argument into a # single-quoted string! print dgettext smellovision => "Hello World!\n"; # The following assignment only works with prototyped functions. # Otherwise, the functions will act as "greedy" list operators and # eat up all following arguments. my $anonymous_hash = @{ planet => gettext "earth", cakes => ngettext "one cake", "several cakes", $n, still => $works, @}; # The same without fat comma: my $other_hash = @{ 'planet', gettext "earth", 'cakes', ngettext "one cake", "several cakes", $n, 'still', $works, @}; # Parentheses are only significant for the first argument. print dngettext 'package', ("one cake", "several cakes", $n), $discarded; @end group @end example @node Long Lines @subsubsection How To Grok with Long Lines @cindex Perl long lines The necessity of long messages can often lead to a cumbersome or unreadable coding style. Perl has several options that may prevent you from writing unreadable code, and @code{xgettext} does its best to do likewise. This is where the dot operator (the string concatenation operator) may come in handy: @example @group print gettext ("This is a very long" . " message that is still" . " readable, because" . " it is split into" . " multiple lines.\n"); @end group @end example Perl is smart enough to concatenate these constant string fragments into one long string at compile time, and so is @code{xgettext}. You will only find one long message in the resulting POT file. Note that the future Perl 6 will probably use the underscore (@samp{_}) as the string concatenation operator, and the dot (@samp{.}) for dereferencing. This new syntax is not yet supported by @code{xgettext}. If embedded newline characters are not an issue, or even desired, you may also insert newline characters inside quoted strings wherever you feel like it: @example @group print gettext ("In HTML output embedded newlines are generally no problem, since adjacent whitespace is always rendered into a single space character."); @end group @end example You may also consider to use here documents: @example @group print gettext <In HTML output embedded newlines are generally no problem, since adjacent whitespace is always rendered into a single space character. EOF @end group @end example Please do not forget that the line breaks are real, i.e.@: they translate into newline characters that will consequently show up in the resulting POT file. @node Perl Pitfalls @subsubsection Bugs, Pitfalls, And Things That Do Not Work @cindex Perl pitfalls The foregoing sections should have proven that @code{xgettext} is quite smart in extracting translatable strings from Perl sources. Yet, some more or less exotic constructs that could be expected to work, actually do not work. One of the more relevant limitations can be found in the implementation of variable interpolation inside quoted strings. Only simple hash lookups can be used there: @example print </gettext ("Sunday")/e; @end example The modifier @code{e} will cause the substitution to be interpreted as an evaluable statement. Consequently, at runtime the function @code{gettext()} is called, but again, the parser fails to extract the string ``Sunday''. Use a temporary variable as a simple workaround if you really happen to need this feature: @example my $sunday = gettext "Sunday"; s//$sunday/; @end example Hash slices would also be handy but are not recognized: @example my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'@}; # Or even: @@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday Friday Saturday) @}; @end example This is perfectly valid usage of the tied hash @code{%gettext} but the strings are not recognized and therefore will not be extracted. Another caveat of the current version is its rudimentary support for non-ASCII characters in identifiers. You may encounter serious problems if you use identifiers with characters outside the range of 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'. Maybe some of these missing features will be implemented in future versions, but since you can always make do without them at minimal effort, these todos have very low priority. A nasty problem are brace format strings that already contain braces as part of the normal text, for example the usage strings typically encountered in programs: @example die "usage: $0 @{OPTIONS@} FILENAME...\n"; @end example If you want to internationalize this code with Perl brace format strings, you will run into a problem: @example die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0); @end example Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}} is not and should probably be translated. Yet, there is no way to teach the Perl parser in @code{xgettext} to recognize the first one, and leave the other one alone. There are two possible work-arounds for this problem. If you are sure that your program will run under Perl 5.8.0 or newer (these Perl versions handle positional parameters in @code{printf()}) or if you are sure that the translator will not have to reorder the arguments in her translation -- for example if you have only one brace placeholder in your string, or if it describes a syntax, like in this one --, you can mark the string as @code{no-perl-brace-format} and use @code{printf()}: @example # xgettext: no-perl-brace-format die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0); @end example If you want to use the more portable Perl brace format, you will have to do put placeholders in place of the literal braces: @example die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n", program => $0, '[' => '@{', ']' => '@}'); @end example Perl brace format strings know no escaping mechanism. No matter how this escaping mechanism looked like, it would either give the programmer a hard time, make translating Perl brace format strings heavy-going, or result in a performance penalty at runtime, when the format directives get executed. Most of the time you will happily get along with @code{printf()} for this special case.