• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: Coding Guidelines
4nav_order: 1
5parent: Misc
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# Coding Guidelines
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23## Overview
24
25This section provides the guidelines for developing C and C++ code, based on the
26coding conventions used by ICU programmers in the creation of the ICU library.
27
28## Details about ICU Error Codes
29
30When calling an ICU API function and an error code pointer (C) or reference
31(C++), a `UErrorCode` variable is often passed in. This variable is allocated by
32the caller and must pass the test `U_SUCCESS()` before the function call.
33Otherwise, the function will not work. Normally, an error code variable is
34initialized by `U_ZERO_ERROR`.
35
36`UErrorCode` is passed around and used this way, instead of using C++ exceptions
37for the following reasons:
38
39* It is useful in the same form for C also
40* Some C++ compilers do not support exceptions
41
42> :point_right: **Note**: *This error code mechanism, in fact, works similar to
43> exceptions. If users call several ICU functions in a sequence, as soon as one
44> sets a failure code, the functions in the following example will not work. This
45> procedure prevents the API function from processing data that is not valid in
46> the sequence of function calls and relieves the caller from checking the error
47> code after each call. It is somewhat similar to how an exception terminates a
48> function block or try block early.*
49
50The following code shows the inside of an ICU function implementation:
51
52```c++
53U_CAPI const UBiDiLevel * U_EXPORT2
54ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
55    int32_t start, length;
56
57    if(U_FAILURE(*pErrorCode)) {
58        return NULL;
59    } else if(pBiDi==NULL || (length=pBiDi->length)<=0) {
60        *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
61        return NULL;
62    }
63
64    ...
65    return result;
66}
67```
68
69Note: We have decided that we do not want to test for `pErrorCode==NULL`. Some
70existing code does this, but new code should not.
71
72Note: *Callers* (as opposed to implementers) of ICU APIs can simplify their code
73by defining and using a subclass of `icu::ErrorCode`. ICU implementers can use the
74`IcuTestErrorCode` class in intltest code.
75
76It is not necessary to check for `U_FAILURE()` immediately before calling a
77function that takes a `UErrorCode` parameter, because that function is supposed to
78check for failure. Exception: If the failure comes from objection allocation or
79creation, then you probably have a `NULL` object pointer and must not call any
80method on that object, not even one with a `UErrorCode` parameter.
81
82### Sample Function with Error Checking
83
84```c++
85    U_CAPI int32_t U_EXPORT2
86    uplrules_select(const UPluralRules *uplrules,   // Do not check
87                                                    // "this"/uplrules vs. NULL.
88                    double number,
89                    UChar *keyword, int32_t capacity,
90                    UErrorCode *status)             // Do not check status!=NULL.
91    {
92        if (U_FAILURE(*status)) {                   // Do check for U_FAILURE()
93                                                    // before setting *status
94            return 0;                               // or calling UErrorCode-less
95                                                    // select(number).
96        }
97        if (keyword == NULL ? capacity != 0 : capacity < 0) {
98                                                    // Standard destination buffer
99                                                    // checks.
100            *status = U_ILLEGAL_ARGUMENT_ERROR;
101            return 0;
102        }
103        UnicodeString result = ((PluralRules*)uplrules)->select(number);
104        return result.extract(keyword, capacity, *status);
105    }
106```
107
108### New API Functions
109
110If the API function is non-const, then it should have a `UErrorCode` parameter.
111(Not the other way around: Some const functions may need a `UErrorCode` as well.)
112
113Default C++ assignment operators and copy constructors should not be used (they
114should be declared private and not implemented). Instead, define an `assign(Class
115&other, UErrorCode &errorCode)` function. Normal constructors are fine, and
116should have a `UErrorCode` parameter.
117
118### Warning Codes
119
120Some `UErrorCode` values do not indicate a failure but an additional informational
121return value. Their enum constants have the `_WARNING` suffix and they pass the
122`U_SUCCESS()` test.
123
124However, experience has shown that they are problematic: They can get lost
125easily because subsequent function calls may set their own "warning" codes or
126may reset a `UErrorCode` to `U_ZERO_ERROR`.
127
128The source of the problem is that the `UErrorCode` mechanism is designed to mimic
129C++/Java exceptions. It prevents ICU function execution after a failure code is
130set, but like exceptions it does not work well for non-failure information
131passing.
132
133Therefore, we recommend to use warning codes very carefully:
134
135* Try not to rely on any warning codes.
136* Use real APIs to get the same information if possible.
137  For example, when a string is completely written but cannot be
138  NUL-terminated, then `U_STRING_NOT_TERMINATED_WARNING` indicates this, but so
139  does the returned destination string length (which will have the same value
140  as the destination capacity in this case). Checking the string length is
141  safer than checking the warning code. (It is even safer to not rely on
142  NUL-terminated strings but to use the length.)
143* If warning codes must be used, then the best is to set the `UErrorCode` to
144  `U_ZERO_ERROR` immediately before calling the function in question, and to
145  check for the expected warning code immediately after the function returns.
146
147Future versions of ICU will not introduce new warning codes, and will provide
148real API replacements for all existing warning codes.
149
150### Bogus Objects
151
152Some objects, for example `UnicodeString` and `UnicodeSet`, can become "bogus". This
153is used when methods that create or modify the object fail (mostly due to an
154out-of-memory condition) but do not take a `UErrorCode` parameter and can
155therefore not otherwise report the failure.
156
157* A bogus object appears as empty.
158* A bogus object cannot be modified except with assignment-like functions.
159* The bogus state of one object does not transfer to another. For example,
160  adding a bogus `UnicodeString` to a `UnicodeSet` does not make the set bogus.
161  (It would be hard to make propagation consistent and test it well. Also,
162  propagation among bogus states and error codes would be messy.)
163* If a bogus object is passed into a function that does have a `UErrorCode`
164  parameter, then the function should set the `U_ILLEGAL_ARGUMENT_ERROR` code.
165
166## API Documentation
167
168"API" means any public class, function, or constant.
169
170### API status tag
171
172Aside from documenting an API's functionality, parameters, return values etc. we
173also mark every API with whether it is `@draft`, `@stable`, `@deprecated` or
174`@internal`. (Where `@internal` is used when something is not actually supported
175API but needs to be physically public anyway.) A new API is usually marked with
176"`@draft ICU 4.8`". For details of how we mark APIs see the "ICU API
177compatibility" section of the [ICU Architectural Design](../design.md) page. In
178Java, also see existing @draft APIs for complete examples.
179
180Functions that override a base class or interface definition take the API status
181of the base class function. For C++, use the `@copydoc base::function()` tag to
182copy both the description and the API status from the base function definition.
183For Java methods the status tags must be added by hand; use the `{@inheritDoc}`
184JavaDoc tag to pick up the rest of the base function documentation.
185Documentation should not be manually replicated in overriding functions; it is
186too hard to keep multiple copies synchronized.
187
188The policy for the treatment of status tags in overriding functions was
189introduced with ICU 64 for C++, and with ICU 59 for Java. Earlier code may
190deviate.
191
192### Coding Example
193
194Coding examples help users to understand the usage of each API. Whenever
195possible, it is encouraged to embed a code snippet illustrating the usage of an
196API along with the functional specification.
197
198#### Embedding Coding Examples in ICU4J - JCite
199
200Since ICU4J 49M2, the ICU4J ant build target "doc" utilizes an external tool
201called [JCite](https://arrenbrecht.ch/jcite/). The tool allows us to cite a
202fragment of existing source code into JavaDoc comment using a tag. To embed a
203code snippet with the tag. For example,
204`{@.jcite com.ibm.icu.samples.util.timezone.BasicTimeZoneExample:---getNextTransitionExample}`
205will be replaced a fragment of code marked by comment lines
206`// ---getNextTransisionExample` in `BasicTimeZoneExample.java` in package
207`com.ibm.icu.samples.util.timezone`. When embedding code snippet using JCite, we
208recommend to follow next guidelines
209
210* A sample code should be placed in `<icu4j_root>/samples/src` directory,
211  although you can cite any source fragment from source files in
212  `<icu4j_root>/demos/src`, `<icu4j_root\>/main/core/*/src`,
213  `<icu4j_root>/main/test/*/src`.
214* A sample code should use package name -
215  `com.ibm.icu.samples.<subpackage>.<facility>`. `<subpackage>` is corresponding
216  to the target ICU API class's package, that is, one of lang/math/text/util.
217  `<facility>` is a name of facility, which is usually the base class of the
218  service. For example, use package `com.ibm.icu.samples.text.dateformat` for
219  samples related to ICU's date format service,
220  `com.ibm.icu.samples.util.timezone` for samples related to time zone service.
221* A sample code should be self-contained as much as possible (use only JDK and
222  ICU public APIs if possible). This allows readers to cut & paste a code
223  snippet to try it out easily.
224* The citing comment should start with three consecutive hyphen followed by
225  lower camel case token - for example, "`// ---compareToExample`"
226* Keep in mind that the JCite tag `{@.jcite ...}` is not resolved without JCite.
227  It is encouraged to avoid placing code snippet within a sentence. Instead,
228  you should place a code snippet using JCite in an independent paragraph.
229
230#### Embedding Coding Examples in ICU4C
231
232Also since ICU4C 49M2, ICU4C docs (using the [\\snippet command](http://www.doxygen.nl/manual/commands.html#cmdsnippet)
233which is new in Doxygen 1.7.5) can cite a fragment of existing sample or test code.
234
235Example in `ucnv.h`:
236
237```c++
238 /**
239  * \snippet samples/ucnv/convsamp.cpp ucnv_open
240  */
241 ucnv_open( ... ) ...
242```
243
244This cites code in `icu4c/source/samples/ucnv/convsamp.cpp` as follows:
245
246```c++
247  //! [ucnv_open]
248  conv = ucnv_open("koi8-r", &status);
249  //! [ucnv_open]
250```
251
252Notice the tag "`ucnv_open`" which must be the same in all three places (in
253the header file, and twice in the cited file).
254
255## C and C++ Coding Conventions Overview
256
257The ICU group uses the following coding guidelines to create software using the
258ICU C++ classes and methods as well as the ICU C methods.
259
260### C/C++ Hiding Un-@stable APIs
261
262In C/C++, we enclose `@draft` and such APIs with `#ifndef U_HIDE_DRAFT_API` or
263similar as appropriate. When a draft API becomes stable, we need to remove the
264surrounding `#ifndef`.
265
266Note: The `@system` tag is *in addition to* the
267`@draft`/`@stable`/`@deprecated`/`@obsolete` status tag.
268
269Copy/paste the appropriate `#ifndef..#endif` pair from the following:
270
271```c++
272#ifndef U_HIDE_DRAFT_API
273#endif  // U_HIDE_DRAFT_API
274
275#ifndef U_HIDE_DEPRECATED_API
276#endif  // U_HIDE_DEPRECATED_API
277
278#ifndef U_HIDE_OBSOLETE_API
279#endif  // U_HIDE_OBSOLETE_API
280
281#ifndef U_HIDE_SYSTEM_API
282#endif  // U_HIDE_SYSTEM_API
283
284#ifndef U_HIDE_INTERNAL_API
285#endif  // U_HIDE_INTERNAL_API
286```
287
288We `#ifndef` `@draft`/`@deprecated`/... APIs as much as possible, including C
289functions, many C++ class methods (see exceptions below), enum constants (see
290exceptions below), whole enums, whole classes, etc.
291
292We do not `#ifndef` APIs where that would be problematic:
293
294* struct/class members where that would modify the object layout (non-static
295  struct/class fields, virtual methods)
296* enum constants where that would modify the numeric values of following
297  constants
298  * actually, best to use `#ifndef` together with explicitly defining the
299    numeric value of the next constant
300* C++ class boilerplate (e.g., default/copy constructors), if
301  the compiler would auto-create public functions to replace `#ifndef`’ed ones
302  * For example, the compiler automatically creates a default constructor if
303    the class does not specify any other constructors.
304* private class members
305* definitions in internal/test/tools header files (that would be pointless;
306  they should probably not have API tags in the first place)
307* forward or friend declarations
308* definitions that are needed for other definitions that would not be
309  `#ifndef`'ed (e.g., for public macros or private methods)
310* platform macros (mostly in `platform.h`/`umachine.h` & similar) and
311  user-configurable settings (mostly in `uconfig.h`)
312
313More handy copy-paste text:
314
315```c++
316    // Do not enclose the protected default constructor with #ifndef U_HIDE_INTERNAL_API
317    // or else the compiler will create a public default constructor.
318
319    // Do not enclose protected default/copy constructors with #ifndef U_HIDE_INTERNAL_API
320    // or else the compiler will create public ones.
321```
322
323### C and C++ Type and Format Convention Guidelines
324
325The following C and C++ type and format conventions are used to maximize
326portability across platforms and to provide consistency in the code:
327
328#### Constants (#define, enum items, const)
329
330Use uppercase letters for constants. For example, use `UBREAKITERATOR_DONE`,
331`UBIDI_DEFAULT_LTR`, `ULESS`.
332
333For new enum types (as opposed to new values added to existing types), do not
334define enum types in C++ style. Instead, define C-style enums with U... type
335prefix and `U_`/`UMODULE_` constants. Define such enum types outside the ICU
336namespace and outside any C++ class. Define them in C header files if there are
337appropriate ones.
338
339#### Variables and Functions
340
341Use mixed-case letters that start with a lowercase letter for variables and
342functions. For example, use `getLength()`.
343
344#### Types (class, struct, enum, union)
345
346Use mixed-case that start with an uppercase letter for types. For example, use
347class `DateFormatSymbols`.
348
349#### Function Style
350
351Use the `getProperty()` and `setProperty()` style for functions where a lowercase
352letter begins the first word and the second word is capitalized without a space
353between it and the first word. For example, `UnicodeString`
354`getSymbol(ENumberFormatSymbol symbol)`,
355`void setSymbol(ENumberFormatSymbol symbol, UnicodeString value)` and
356`getLength()`, `getSomethingAt(index/offset)`.
357
358#### Common Parameter Names
359
360In order to keep function parameter names consistent, the following are
361recommendations for names or suffixes (usual "Camel case" applies):
362
363* "start": the index (of the first of several code units) in a string or array
364* "limit": the index (of the **first code unit after** a specified range) in a
365  string or array (the number of units are (limit-start))
366* name the length (for the number of code units in a (range of a) string or
367  array) either "length" or "somePrefixLength"
368* name the capacity (for the number of code units available in an output
369  buffer) either "capacity" or "somePrefixCapacity"
370
371#### Order of Source/Destination Arguments
372
373Many ICU function signatures list source arguments before destination arguments,
374as is common in C++ and Java APIs. This is the preferred order for new APIs.
375(Example: `ucol_getSortKey(const UCollator *coll, const UChar *source,
376int32_t sourceLength, uint8_t *result, int32_t resultLength)`)
377
378Some ICU function signatures list destination arguments before source arguments,
379as is common in C standard library functions. This should be limited to
380functions that closely resemble such C standard library functions or closely
381related ICU functions. (Example: `u_strcpy(UChar *dst, const UChar *src)`)
382
383#### Order of Include File Includes
384
385Include system header files (like `<stdio.h>`) before ICU headers followed by
386application-specific ones. This assures that ICU headers can use existing
387definitions from system headers if both happen to define the same symbols. In
388ICU files, all used headers should be explicitly included, even if some of them
389already include others.
390
391Within a group of headers, place them in alphabetical order.
392
393#### Style for ICU Includes
394
395All ICU headers should be included using ""-style includes (like
396`"unicode/utypes.h"` or `"cmemory.h"`) in source files for the ICU library, tools,
397and tests.
398
399#### Pointer Conversions
400
401Do not cast pointers to integers or integers to pointers. Also, do not cast
402between data pointers and function pointers. This will not work on some
403compilers, especially with different sizes of such types. Exceptions are only
404possible in platform-specific code where the behavior is known.
405
406Please use C++-style casts, at least for pointers, for example `const_cast`.
407
408* For conversion between related types, for example from a base class to a
409  subclass (when you *know* that the object is of that type), use
410  `static_cast`. (When you are not sure if the object has the subclass type,
411  then use a `dynamic_cast`; see a later section about that.)
412* Also use `static_cast`, not `reinterpret_cast`, for conversion from `void *`
413  to a specific pointer type. (This is accepted and recommended because there
414  is an implicit conversion available for the opposite conversion.) See
415  [ICU-9434](https://unicode-org.atlassian.net/browse/ICU-9434) for details.
416* For conversion between unrelated types, for example between `char *` and
417  `uint8_t *`, or between `Collator *` and `UCollator *`, use a
418  `reinterpret_cast`.
419
420#### Returning a Number of Items
421
422To return a number of items, use `countItems()`, **not** `getItemCount()`, even if
423there is no need to actually count using that member function.
424
425#### Ranges of Indexes
426
427Specify a range of indexes by having start and limit parameters with names or
428suffix conventions that represent the index. A range should contain indexes from
429start to limit-1 such as an interval that is left-closed and right-open. Using
430mathematical notation, this is represented as: \[start..limit\[.
431
432#### Functions with Buffers
433
434Set the default value to -1 for functions that take a buffer (pointer) and a
435length argument with a default value so that the function determines the length
436of the input itself (for text, calling `u_strlen()`). Any other negative or
437undefined value constitutes an error.
438
439#### Primitive Types
440
441Primitive types are defined by the `unicode/utypes.h` file or a header file that
442includes other header files. The most common types are `uint8_t`, `uint16_t`,
443`uint32_t`, `int8_t`, `int16_t`, `int32_t`, `char16_t`,
444`UChar` (same as `char16_t`), `UChar32` (signed, 32-bit), and `UErrorCode`.
445
446The language built-in type `bool` and constants `true` and `false` may be used
447internally, for local variables and parameters of internal functions. The ICU
448type `UBool` must be used in public APIs and in the definition of any persistent
449data structures. `UBool` is guaranteed to be one byte in size and signed; `bool` is
450not.
451
452Traditionally, ICU4C has defined its own `FALSE`=0 / `TRUE`=1 macros for use with `UBool`.
453Starting with ICU 68 (2020q4), we no longer define these in public header files
454(unless `U_DEFINE_FALSE_AND_TRUE`=1),
455in order to avoid name collisions with code outside ICU defining enum constants and similar
456with these names.
457
458Instead, the versions of the C and C++ standards we require now do define type `bool`
459and values `false` & `true`, and we and our users can use these values.
460
461As of ICU 68, we are not changing ICU4C API from `UBool` to `bool`.
462Doing so in C API, or in structs that cross the library boundary,
463would break binary compatibility.
464Doing so only in other places in C++ could be confusingly inconsistent.
465We may revisit this.
466
467Note that the details of type `bool` (e.g., `sizeof`) depend on the compiler and
468may differ between C and C++.
469
470#### File Names (.h, .c, .cpp, data files if possible, etc.)
471
472Limit file names to 31 lowercase ASCII characters. (Older versions of MacOS have
473that length limit.)
474
475Exception: The layout engine uses mixed-case file names.
476
477(We have abandoned the 8.3 naming standard although we do not change the names
478of old header files.)
479
480#### Language Extensions and Standards
481
482Proprietary features, language extensions, or library functions, must not be
483used because they will not work on all C or C++ compilers.
484In Microsoft Visual C++, go to Project Settings(alt-f7)->All Configurations->
485C/C++->Customize and check Disable Language Extensions.
486
487Exception: some Microsoft headers will not compile without language extensions
488being enabled, which in turn requires some ICU files be built with language
489extensions.
490
491#### Tabs and Indentation
492
493Save files with spaces instead of tab characters (\\x09). The indentation size
494is 4.
495
496#### Documentation
497
498Use Java doc-style in-file documentation created with
499[doxygen](http://www.doxygen.org/) .
500
501#### Multiple Statements
502
503Place multiple statements in multiple lines. `if()` or loop heads must not be
504followed by their bodies on the same line.
505
506#### Placements of `{}` Curly Braces
507
508Place curly braces `{}` in reasonable and consistent locations. Each of us
509subscribes to different philosophies. It is recommended to use the style of a
510file, instead of mixing different styles. It is requested, however, to not have
511`if()` and loop bodies without curly braces.
512
513#### `if() {...}` and Loop Bodies
514
515Use curly braces for `if()` and else as well as loop bodies, etc., even if there
516is only one statement.
517
518#### Function Declarations
519
520Have one line that has the return type and place all the import declarations,
521extern declarations, export declarations, the function name, and function
522signature at the beginning of the next line.
523
524Function declarations need to be in the form `U_CAPI` return-type `U_EXPORT2` to
525satisfy all the compilers' requirements.
526
527For example, use the following
528convention:
529
530```c++
531U_CAPI int32_t U_EXPORT2
532u_formatMessage(...);
533```
534
535> :point_right: **Note**: The `U_CAPI`/`U_DEPRECATED` and `U_EXPORT2` qualifiers
536> are required for both the declaration and the definiton of *exported C and
537> static C++ functions*. Use `U_CAPI` (or `U_DEPRECATED`) before and `U_EXPORT2`
538> after the return type of *exported C and static C++ functions*.
539>
540> Internal functions that are visible outside a compilation unit need a `U_CFUNC`
541> before the return type.
542>
543> *Non-static C++ class member functions* do *not* get `U_CAPI`/`U_EXPORT2`
544> because they are exported and declared together with their class exports.
545
546> :point_right: **Note**: Before ICU 68 (2020q4) we used to use alternate qualifiers
547> like `U_DRAFT`, `U_STABLE` etc. rather than `U_CAPI`,
548> but keeping these in sync with API doc tags `@draft` and guard switches like `U_HIDE_DRAFT_API`
549> was tedious and error-prone and added no value.
550> Since ICU 68 (ICU-9961) we only use `U_CAPI` and `U_DEPRECATED`.
551
552#### Use Anonymous Namesapces or Static For File Scope
553
554Use anonymous namespaces or `static` for variables, functions, and constants that
555are not exported explicitly by a header file. Some platforms are confused if
556non-static symbols are not explicitly declared extern. These platforms will not
557be able to build ICU nor link to it.
558
559#### Using C Callbacks From C++ Code
560
561z/OS and Windows COM wrappers around ICU need `__cdecl` for callback functions.
562The reason is that C++ can have a different function calling convention from C.
563These callback functions also usually need to be private. So the following code
564
565```c++
566UBool
567isAcceptable(void * /* context */,
568             const char * /* type */, const char * /* name */,
569             const UDataInfo *pInfo)
570{
571    // Do something here.
572}
573```
574
575should be changed to look like the following by adding `U_CDECL_BEGIN`, `static`,
576`U_CALLCONV` and `U_CDECL_END`.
577
578```c++
579U_CDECL_BEGIN
580static UBool U_CALLCONV
581isAcceptable(void * /* context */,
582             const char * /* type */, const char * /* name */,
583             const UDataInfo *pInfo)
584{
585    // Do something here.
586}
587U_CDECL_END
588```
589
590#### Same Module and Functionality in C and in C++
591
592Determine if two headers are needed. If the same functionality is provided with
593both a C and a C++ API, then there can be two headers, one for each language,
594even if one uses the other. For example, there can be `umsg.h` for C and `msgfmt.h`
595for C++.
596
597Not all functionality has or needs both kinds of API. More and more
598functionality is available only via C APIs to avoid duplication of API,
599documentation, and maintenance. C APIs are perfectly usable from C++ code,
600especially with `UnicodeString` methods that alias or expose C-style string
601buffers.
602
603#### Platform Dependencies
604
605Use the platform dependencies that are within the header files that `utypes.h`
606files include. They are `platform.h` (which is generated by the configuration
607script from `platform.h.in`) and its more specific cousins like `pwin32.h` for
608Windows, which define basic types, and `putil.h`, which defines platform
609utilities.
610**Important:** Outside of these files, and a small number of implementation
611files that depend on platform differences (like `umutex.c`), **no** ICU source
612code may have **any** `#ifdef` **OperatingSystemName** instructions.
613
614#### Short, Unnested Mutex Blocks
615
616Do not use function calls within a mutex block for mutual-exclusion (mutex)
617blocks. This can prevent deadlocks from occurring later. There should be as
618little code inside a mutex block as possible to minimize the performance
619degradation from blocked threads.
620Also, it is not guaranteed that mutex blocks are re-entrant; therefore, they
621must not be nested.
622
623#### Names of Internal Functions
624
625Internal functions that are not declared static (regardless of inlining) must
626follow the naming conventions for exported functions because many compilers and
627linkers do not distinguish between library exports and intra-library visible
628functions.
629
630#### Which Language for the Implementation
631
632Write implementation code in C++. Use objects very carefully, as always:
633Implicit constructors, assignments etc. can make simple-looking code
634surprisingly slow.
635
636For every C API, make sure that there is at least one call from a pure C file in
637the cintltst test suite.
638
639Background: We used to prefer C or C-style C++ for implementation code because
640we used to have users ask for pure C. However, there was never a large, usable
641subset of ICU that was usable without any C++ dependencies, and C++ can(!) make
642for much shorter, simpler, less error-prone and easier-to-maintain code, for
643example via use of "smart pointers" (`unicode/localpointer.h` and `cmemory.h`).
644
645We still try to expose most functionality via *C APIs* because of the
646difficulties of binary compatible C++ APIs exported from DLLs/shared libraries.
647
648#### No Compiler Warnings
649
650ICU must compile without compiler warnings unless such warnings are verified to
651be harmless or bogus. Often times a warning on one compiler indicates a breaking
652error on another.
653
654#### Enum Values
655
656When casting an integer value to an enum type, the enum type *should* have a
657constant with this integer value, or at least it *must* have a constant whose
658value is at least as large as the integer value being cast, with the same
659signedness. For example, do not cast a -1 to an enum type that only has
660non-negative constants. Some compilers choose the internal representation very
661tightly for the defined enum constants, which may result in the equivalent of a
662`uint8_t` representation for an enum type with only small, non-negative constants.
663Casting a -1 to such a type may result in an actual value of 255. (This has
664happened!)
665
666When casting an enum value to an integer type, make sure that the enum value's
667numeric value is within range of the integer type.
668
669#### Do not check for `this!=NULL`, do not check for `NULL` references
670
671In public APIs, assume `this!=0` and assume that references are not 0. In C code,
672`"this"` is the "service object" pointer, such as `set` in
673`uset_add(USet* set, UChar32 c)` — don't check for `set!=NULL`.
674
675We do usually check all other (non-this) pointers for `NULL`, in those cases when
676`NULL` is not valid. (Many functions allow a `NULL` string or buffer pointer if the
677length or capacity is 0.)
678
679Rationale: `"this"` is not really an argument, and checking it costs a little bit
680of code size and runtime. Other libraries also commonly do not check for valid
681`"this"`, and resulting failures are fairly obvious.
682
683### Memory Usage
684
685#### Dynamically Allocated Memory
686
687ICU4C APIs are designed to allow separate heaps for its libraries vs. the
688application. This is achieved by providing factory methods and matching
689destructors for all allocated objects. The C++ API uses a common base class with
690overridden `new`/`delete` operators and/or forms an equivalent pair with `createXyz()`
691factory methods and the `delete` operator. The C API provides pairs of `open`/`close`
692functions for each service. See the C++ and C guideline sections below for
693details.
694
695Exception: Most C++ API functions that return a `StringEnumeration` (by pointer
696which the caller must delete) are named `getXyz()` rather than `createXyz()`
697because `"get"` is much more natural. (These are not factory methods in the sense
698of `NumberFormat::createScientificInstance()`.) For example,
699`static StringEnumeration *Collator::``get``Keywords(UErrorCode &)`. We should document
700clearly in the API comments that the caller must delete the returned
701`StringEnumeration`.
702
703#### Declaring Static Data
704
705All unmodifiable data should be declared `const`. This includes the pointers and
706the data itself. Also if you do not need a pointer to a string, declare the
707string as an array. This reduces the time to load the library and all its
708pointers. This should be done so that the same library data can be shared across
709processes automatically. Here is an example:
710
711```c++
712#define MY_MACRO_DEFINED_STR "macro string"
713const char *myCString = "myCString";
714int16_t myNumbers[] = {1, 2, 3};
715```
716
717This should be changed to the following:
718
719```c++
720static const char MY_MACRO_DEFINED_STR[] = "macro string";
721static const char myCString[] = "myCString";
722static const int16_t myNumbers[] = {1, 2, 3};
723```
724
725#### No Static Initialization
726
727The most common reason to have static initialization is to declare a
728`static const UnicodeString`, for example (see `utypes.h` about invariant characters):
729
730```c++
731static const UnicodeString myStr("myStr", "");
732```
733
734The most portable and most efficient way to declare ASCII text as a Unicode
735string is to do the following instead:
736
737```c++
738static const UChar myStr[] = { 0x6D, 0x79, 0x53, 0x74, 0x72, 0}; /* "myStr" */
739```
740
741We do not use character literals
742for Unicode characters and strings because the execution character set of C/C++
743compilers is almost never Unicode and may not be ASCII-compatible (especially on
744EBCDIC platforms). Depending on the API where the string is to be used, a
745terminating NUL (0) may or may not be required. The length of the string (number
746of `UChar`s in the array) can be determined with `sizeof(myStr)/U_SIZEOF_UCHAR`,
747(subtract 1 for the NUL if present). Always remember to put in a comment at the
748end of the declaration what the Unicode string says.
749
750Static initialization of C++ objects **must not be used** in ICU libraries
751because of the following reasons:
752
7531. It leads to intractable order-of-initialization dependencies.
7542. It makes it difficult or impossible to release all of the libraries
755   resources. See `u_cleanup()`.
7563. It takes time to initialize the library.
7574. Dependency checking is not completely done in C or C++. For instance, if an
758   ICU user creates an ICU object or calls an ICU function statically that
759   depends on static data, it is not guaranteed that the statically declared
760   data is initialized.
7615. Certain users like to manage their own memory. They can not manage ICU's
762   memory properly because of item #2.
7636. It is easier to debug code that does not use static initialization.
7647. Memory allocated at static initialization time is not guaranteed to be
765   deallocated with a C++ destructor when the library is unloaded. This is a
766   problem when ICU is unloaded and reloaded into memory and when you are using
767   a heap debugging tool. It would also not work with the `u_cleanup()` function.
7688. Some platforms cannot handle static initialization or static destruction
769   properly. Several compilers have this random bug (even in the year 2001).
770
771ICU users can use the `U_STRING_DECL` and `U_STRING_INIT` macros for C strings. Note
772that on some platforms this will incur a small initialization cost (simple
773conversion). Also, ICU users need to make sure that they properly and
774consistently declare the strings with both macros. See `ustring.h` for details.
775
776### C++ Coding Guidelines
777
778This section describes the C++ specific guidelines or conventions to use.
779
780#### Portable Subset of C++
781
782ICU uses only a portable subset of C++ for maximum portability. Also, it does
783not use features of C++ that are not implemented well in all compilers or are
784cumbersome. In particular, ICU does not use exceptions, or the Standard Template
785Library (STL).
786
787We have started to use templates in ICU 4.2 (e.g., `StringByteSink`) and ICU 4.4
788(`LocalPointer` and some internal uses). We try to limit templates to where they
789provide a lot of benefit (robust code, avoid duplication) without much or any
790code bloat.
791
792We continue to not use the Standard Template Library (STL) in ICU library code
793because its design causes a lot of code bloat. More importantly:
794
795* Exceptions: STL classes and algorithms throw exceptions. ICU does not throw
796  exceptions, and ICU code is not exception-safe.
797* Memory management: STL uses default new/delete, or Allocator parameters
798  which create different types; they throw out-of-memory exceptions. ICU
799  memory allocation is customizable and must not throw exceptions.
800* Non-polymorphic: For APIs, STL classes are also problematic because
801  different template specializations create different types. For example, some
802  systems use custom string classes (different allocators, different
803  strategies for buffer sharing vs. copying), and ICU should be able to
804  interface with most of them.
805
806We have started to use compiler-provided Run-Time Type Information (RTTI) in ICU
8074.6. It is now required for building ICU, and encouraged for using ICU where
808RTTI is needed. For example, use `dynamic_cast<DecimalFormat*>` on a
809`NumberFormat` pointer that is usually but not always a `DecimalFormat` instance.
810Do not use `dynamic_cast<>` on a reference, because that throws a `bad_cast`
811exception on failure.
812
813ICU uses a limited form of multiple inheritance equivalent to Java's interface
814mechanism: All but one base classes must be interface/mixin classes, i.e., they
815must contain only pure virtual member functions. For details see the
816'boilerplate' discussion below. This restriction to at most one base class with
817non-virtual members eliminates problems with the use and implementation of
818multiple inheritance in C++. ICU does not use virtual base classes.
819
820> :point_right: **Note**: Every additional base class, *even an interface/mixin
821class*, adds another vtable pointer to each subclass object, that is, it
822*increases the object/instance size by 8 bytes* on most platforms.
823
824#### Classes and Members
825
826C++ classes and their members do not need a 'U' or any other prefix.
827
828#### Global Operators
829
830Global operators (operators that are not class members) can be problematic for
831library entry point versioning, may confuse users and cannot be easily ported to
832Java (ICU4J). They should be avoided if possible.
833
834~~The issue with library entry point versioning is that on platforms that do not
835support namespaces, users must rename all classes and global functions via
836urename.h. This renaming process is not possible with operators.~~ Starting with
837ICU 49, we require C++ namespace support. However, a global operator can be used
838in ICU4C (when necessary) if its function signature contains an ICU C++ class
839that is versioned. This will result in a mangled linker name that does contain
840the ICU version number via the versioned name of the class parameter. For
841example, ICU4C 2.8 added an operator + for `UnicodeString`, with two `UnicodeString`
842reference parameters.
843
844#### Virtual Destructors
845
846In classes with virtual methods, destructors must be explicitly declared, and
847must be defined (implemented) outside the class definition in a .cpp file.
848
849More precisely:
850
8511. All classes with any virtual members or any bases with any virtual members
852   should have an explicitly declared virtual destructor.
8532. Constructors and destructors should be declared and/or defined prior to
854   *any* other methods, public or private, within the class definition.
8553. All virtual destructors should be defined out-of-line, and in a .cpp file
856   rather than a header file.
857
858This is so that the destructors serve as "key functions" so that the compiler
859emits the vtable in only and exactly the desired files. It can help make
860binaries smaller that use statically-linked ICU libraries, because the compiler
861and linker can prove more easily that some code is not used.
862
863The Itanium C++ ABI (which is used on all x86 Linux) says: "The virtual table
864for a class is emitted in the same object containing the definition of its key
865function, i.e. the first non-pure virtual function that is not inline at the
866point of class definition. If there is no key function, it is emitted everywhere
867used."
868
869(This was first done in ICU 49; see [ticket #8454](https://unicode-org.atlassian.net/browse/ICU-8454.)
870
871#### Namespaces
872
873Beginning with ICU version 2.0, ICU uses namespaces. The actual namespace is
874`icu_M_N` with M being the major ICU release number and N being the minor ICU
875release number. For convenience, the namespace `icu` is an alias to the current
876release-specific one. (The actual namespace name is `icu` itself if renaming is
877turned off.)
878
879Starting with ICU 49, we require C++ namespace support.
880
881Class declarations, even forward declarations, must be scoped to the ICU
882namespace. For example:
883
884```c++
885U_NAMESPACE_BEGIN
886
887class Locale;
888
889U_NAMESPACE_END
890
891// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
892extern void fn(icu::UnicodeString&);
893
894// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
895// automatically set by utypes.h
896// but recommended to be not set automatically
897U_NAMESPACE_USE
898Locale loc("fi");
899```
900
901`U_NAMESPACE_USE` (expands to using namespace icu_M_N; when available) is
902automatically done when `utypes.h` is included, so that all ICU classes are
903immediately usable. However, we recommend that you turn this off via
904`CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"`.
905
906#### Declare Class APIs
907
908Class APIs need to be declared like either of the following:
909
910#### Inline-Implemented Member Functions
911
912Class member functions are usually declared but not inline-implemented in the
913class declaration. A long function implementation in the class declaration makes
914it hard to read the class declaration.
915
916It is ok to inline-implement *trivial* functions in the class declaration.
917Pretty much everyone agrees that inline implementations are ok if they fit on
918the same line as the function signature, even if that means bending the
919single-statement-per-line rule slightly:
920
921```c++
922T *orphan() { T *p=ptr; ptr=NULL; return p; }
923```
924
925Most people also agree that very short multi-line implementations are ok inline
926in the class declaration. Something like the following is probably the maximum:
927
928```c++
929Value *getValue(int index) {
930    if(index>=0 && index<fLimit) {
931        return fArray[index];
932    }
933    return NULL;
934}
935```
936
937If the inline implementation is longer than that, then just declare the function
938inline and put the actual inline implementations after the class declaration in
939the same file. (See `unicode/unistr.h` for many examples.)
940
941If it's significantly longer than that, then it's probably not a good candidate
942for inlining anyway.
943
944#### C++ class layout and 'boilerplate'
945
946There are different sets of requirements for different kinds of C++ classes. In
947general, all instantiable classes (i.e., all classes except for interface/mixin
948classes and ones with only static member functions) inherit the `UMemory` base
949class. `UMemory` provides `new`/`delete` operators, which allows to keep the ICU
950heap separate from the application heap, or to customize ICU's memory allocation
951consistently.
952
953> :point_right: **Note**: Public ICU APIs must return or orphan only C++ objects
954that are to be released with `delete`. They must not return allocated simple
955types (including pointers, and arrays of simple types or pointers) that would
956have to be released with a `free()` function call using the ICU library's heap.
957Simple types and pointers must be returned using fill-in parameters (instead of
958allocation), or cached and owned by the returning API.
959
960**Public ICU C++ classes** must inherit either the `UMemory` or the `UObject`
961base class for proper memory management, and implement the following common set
962of 'boilerplate' functions:
963
964* default constructor
965* copy constructor
966* assignment operator
967* operator==
968* operator!=
969
970> :point_right: **Note**: Each of the above either must be implemented, verified
971that the default implementation according to the C++ standard will work
972(typically not if any pointers are used), or declared private without
973implementation.
974
975* If public subclassing is intended, then the public class must inherit
976  `UObject` and should implement
977  * `clone()`
978* **RTTI:**
979  * If a class is a subclass of a parent (e.g., `Format`) with ICU's "poor
980    man's RTTI" (Run-Time Type Information) mechanism (via
981    `getDynamicClassID()` and `getStaticClassID()`) then add that to the new
982    subclass as well (copy implementations from existing C++ APIs).
983  * If a class is a new, immediate subclass of `UObject` (e.g.,
984    `Normalizer2`), creating a whole new class hierarchy, then declare a
985    *private* `getDynamicClassID()` and define it to return `NULL` (to
986    override the pure virtual version in `UObject`); copy the relevant lines
987    from `normalizer2.h` and `normalizer2.cpp`
988    (`UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION(className)`). Do not add any
989    "poor man's RTTI" at all to subclasses of this class.
990
991**Interface/mixin classes** are equivalent to Java interfaces. They are as much
992multiple inheritance as ICU uses — they do not decrease performance, and they do
993not cause problems associated with multiple base classes having data members.
994Interface/mixin classes contain only pure virtual member functions, and must
995contain an empty virtual destructor. See for example the `UnicodeMatcher` class.
996Interface/mixin classes must not inherit any non-interface/mixin class,
997especially not `UMemory` or `UObject`. Instead, implementation classes must inherit
998one of these two (or a subclass of them) in addition to the interface/mixin
999classes they implement. See for example the `UnicodeSet` class.
1000
1001**Static classes** contain only static member functions and are therefore never
1002instantiated. They must not inherit `UMemory` or `UObject`. Instead, they must
1003declare a private default constructor (without any implementation) to prevent
1004instantiation. See for example the `LESwaps` layout engine class.
1005
1006**C++ classes internal to ICU** need not (but may) implement the boilerplate
1007functions as mentioned above. They must inherit at least `UMemory` if they are
1008instantiable.
1009
1010#### Make Sure The Compiler Uses C++
1011
1012The `__cplusplus` macro being defined ensures that the compiler uses C++. Starting
1013with ICU 49, we use this standard predefined macro.
1014
1015Up until ICU 4.8 we used to define and use `XP_CPLUSPLUS` but that was redundant
1016and did not add any value because it was defined if-and-only-if `__cplusplus` was
1017defined.
1018
1019#### Adoption of Objects
1020
1021Some constructors and factory functions take pointers to objects that they
1022adopt. The newly created object contains a pointer to the adoptee and takes over
1023ownership and lifecycle control. If an error occurs while creating the new
1024object (and thus in the code that adopts an object), then the semantics used
1025within ICU must be *adopt-on-call* (as opposed to, for example,
1026adopt-on-success):
1027
1028* **General**: A constructor or factory function that adopts an object does so
1029  in all cases, even if an error occurs and a `UErrorCode` is set. This means
1030  that either the adoptee is deleted immediately or its pointer is stored in
1031  the new object. The former case is most common when the constructor or
1032  factory function is called and the `UErrorCode` already indicates a failure.
1033  In the latter case, the new object must take care of deleting the adoptee
1034  once it is deleted itself regardless of whether or not the constructor was
1035  successful.
1036
1037* **Constructors**: The code that creates the object with the new operator
1038  must check the resulting pointer returned by new and delete any adoptees if
1039  it is 0 because the constructor was not called. (Typically, a `UErrorCode`
1040  must be set to `U_MEMORY_ALLOCATION_ERROR`.)
1041
1042  **Pitfall**: If you allocate/construct via "`ClassName *p = new ClassName(adoptee);`"
1043  and the memory allocation failed (`p==NULL`), then the
1044  constructor has not been called, the adoptee has not been adopted, and you
1045  are still responsible for deleting it!
1046
1047* **Factory functions (createInstance())**: The factory function must set a
1048  `U_MEMORY_ALLOCATION_ERROR` and delete any adoptees if it cannot allocate the
1049  new object. If the construction of the object fails otherwise, then the
1050  factory function must delete it and the factory function must delete its
1051  adoptees. As a result, a factory function always returns either a valid
1052  object and a successful `UErrorCode`, or a 0 pointer and a failure `UErrorCode`.
1053  A factory function returns a pointer to an object that must be deleted by
1054  the user/owner.
1055
1056Example: (This is a best-practice example. It does not reflect current `Calendar`
1057code.)
1058
1059```c++
1060Calendar*
1061Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
1062    LocalPointer<TimeZone> adoptedZone(zone);
1063    if(U_FAILURE(errorCode)) {
1064        // The adoptedZone destructor deletes the zone.
1065        return NULL;
1066    }
1067    // since the Locale isn't specified, use the default locale
1068    LocalPointer<Calendar> c(new GregorianCalendar(zone, Locale::getDefault(), errorCode));
1069    if(c.isNull()) {
1070        errorCode = U_MEMORY_ALLOCATION_ERROR;
1071        // The adoptedZone destructor deletes the zone. return NULL;
1072    } else if(U_FAILURE(errorCode)) {
1073        // The c destructor deletes the Calendar.
1074        return NULL;
1075    } // c adopted the zone. adoptedZone.orphan();
1076    return c.orphan();
1077}
1078```
1079
1080#### Memory Allocation
1081
1082All ICU C++ class objects directly or indirectly inherit `UMemory` (see
1083'boilerplate' discussion above) which provides `new`/`delete` operators, which in
1084turn call the internal functions in `cmemory.c`. Creating and releasing ICU C++
1085objects with `new`/`delete` automatically uses the ICU allocation functions.
1086
1087> :point_right: **Note**: Remember that (in absence of explicit :: scoping) C++
1088determines which `new`/`delete` operator to use from which type is allocated or
1089deleted, not from the context of where the statement is. Since non-class data
1090types (like `int`) cannot define their own `new`/`delete` operators, C++ always
1091uses the global ones for them by default.
1092
1093When global `new`/`delete` operators are to be used in the application (never inside
1094ICU!), then they should be properly scoped as e.g. `::new`, and the application
1095must ensure that matching `new`/`delete` operators are used. In some cases where
1096such scoping is missing in non-ICU code, it may be simpler to compile ICU
1097without its own `new`/`delete` operators. See `source/common/unicode/uobject.h` for
1098details.
1099
1100In ICU library code, allocation of non-class data types — simple integer types
1101**as well as pointers** — must use the functions in `cmemory.h`/`.c` (`uprv_malloc()`,
1102`uprv_free()`, `uprv_realloc()`). Such memory objects must be released inside ICU,
1103never by the user; this is achieved either by providing a "close" function for a
1104service or by avoiding to pass ownership of these objects to the user (and
1105instead filling user-provided buffers or returning constant pointers without
1106passing ownership).
1107
1108The `cmemory.h`/`.c` functions can be overridden at ICU compile time for custom
1109memory management. By default, `UMemory`'s `new`/`delete` operators are
1110implemented by calling these common functions. Overriding the `cmemory.h`/`.c`
1111functions changes the memory management for both C and C++.
1112
1113C++ objects that were either allocated with new or returned from a `createXYZ()`
1114factory method must be deleted by the user/owner.
1115
1116#### Memory Allocation Failures
1117
1118All memory allocations and object creations should be checked for success. In
1119the event of a failure (a `NULL` returned), a `U_MEMORY_ALLOCATION_ERROR` status
1120should be returned by the ICU function in question. If the allocation failure
1121leaves the ICU service in an invalid state, such that subsequent ICU operations
1122could also fail, the situation should be flagged so that the subsequent
1123operations will fail cleanly. Under no circumstances should a memory allocation
1124failure result in a crash in ICU code, or cause incorrect results rather than a
1125clean error return from an ICU function.
1126
1127Some functions, such as the C++ assignment operator, are unable to return an ICU
1128error status to their caller. In the event of an allocation failure, these
1129functions should mark the object as being in an invalid or bogus state so that
1130subsequent attempts to use the object will fail. Deletion of an invalid object
1131should always succeed.
1132
1133#### Memory Management
1134
1135C++ memory management is error-prone, and memory leaks are hard to avoid, but
1136the following helps a lot.
1137
1138First, if you can stack-allocate an object (for example, a `UnicodeString` or
1139`UnicodeSet`), do so. It is the easiest way to manage object lifetime.
1140
1141Inside functions, avoid raw pointers to owned objects. Instead, use
1142[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)`<UnicodeString>`
1143or `LocalUResouceBundlePointer` etc., which is ICU's "smart pointer"
1144implementation. This is the "[Resource Acquisition Is Initialization(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
1145idiom. The "smart pointer" auto-deletes the object when it goes out of scope,
1146which means that you can just return from the function when an error occurs and
1147all auto-managed objects are deleted. You do not need to remember to write an
1148increasing number of "`delete xyz;`" at every function exit point.
1149
1150*In fact, you should almost never need to write "delete" in any function.*
1151
1152* Except in a destructor where you delete all of the objects which the class
1153  instance owns.
1154* Also, in static "cleanup" functions you still need to delete cached objects.
1155
1156When you pass on ownership of an object, for example to return the pointer of a
1157newly built object, or when you call a function which adopts your object, use
1158`LocalPointer`'s `.orphan()`.
1159
1160* Careful: When you return an object or pass it into an adopting factory
1161  method, you can use `.orphan()` directly.
1162* However, when you pass it into an adopting constructor, you need to pass in
1163  the `.getAlias()`, and only if the *allocation* of the new owner succeeded
1164  (you got a non-NULL pointer for that) do you `.orphan()` your `LocalPointer`.
1165* See the `Calendar::createInstance()` example above.
1166* See the `AlphabeticIndex` implementation for live examples. Search for other
1167  uses of `LocalPointer`/`LocalArray`.
1168
1169Every object must always be deletable/destructable. That is, at a minimum, all
1170pointers to owned memory must always be either NULL or point to owned objects.
1171
1172Internally:
1173
1174[cmemory.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/cmemory.h)
1175defines the `LocalMemory` class for chunks of memory of primitive types which
1176will be `uprv_free()`'ed.
1177
1178[cmemory.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/cmemory.h)
1179also defines `MaybeStackArray` and `MaybeStackHeaderAndArray` which automate
1180management of arrays.
1181
1182Use `CharString`
1183([charstr.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/charstr.h))
1184for `char *` strings that you build and modify.
1185
1186#### Global Inline Functions
1187
1188Global functions (non-class member functions) that are declared inline must be
1189made static inline. Some compilers will export symbols that are declared inline
1190but not static.
1191
1192#### No Declarations in the for() Loop Head
1193
1194Iterations through `for()` loops must not use declarations in the first part of
1195the loop. There have been two revisions for the scoping of these declarations
1196and some compilers do not comply to the latest scoping. Declarations of loop
1197variables should be outside these loops.
1198
1199#### Common or I18N
1200
1201Decide whether or not the module is part of the common or the i18n API
1202collection. Use the appropriate macros. For example, use
1203`U_COMMON_IMPLEMENTATION`, `U_I18N_IMPLEMENTATION`, `U_COMMON_API`, `U_I18N_API`.
1204See `utypes.h`.
1205
1206#### Constructor Failure
1207
1208If there is a reasonable chance that a constructor fails (For example, if the
1209constructor relies on loading data), then either it must use and set a
1210`UErrorCode` or the class needs to support an `isBogus()`/`setToBogus()` mechanism
1211like `UnicodeString` and `UnicodeSet`, and the constructor needs to set the object
1212to bogus if it fails.
1213
1214#### `UVector`, `UVector32`, or `UVector64`
1215
1216Use `UVector` to store arrays of `void *`; use `UVector32` to store arrays of
1217`int32_t`; use `UVector64` to store arrays of `int64_t`. Historically, `UVector`
1218has stored either `int32_t` or `void *`, but now storing `int32_t` in a `UVector`
1219is deprecated in favor of `UVector32`.
1220
1221### C Coding Guidelines
1222
1223This section describes the C-specific guidelines or conventions to use.
1224
1225#### Declare and define C APIs with both `U_CAPI` and `U_EXPORT2`
1226
1227All C APIs need to be **both declared and defined** using the `U_CAPI` and
1228`U_EXPORT2` qualifiers.
1229
1230```c++
1231U_CAPI int32_t U_EXPORT2
1232u_formatMessage(...);
1233```
1234
1235> :point_right: **Note**: Use `U_CAPI` before and `U_EXPORT2` after the return
1236type of exported C functions. Internal functions that are visible outside a
1237compilation unit need a `U_CFUNC` before the return type.
1238
1239#### Subdivide the Name Space
1240
1241Use prefixes to avoid name collisions. Some of those prefixes contain a 3- (or
1242sometimes 4-) letter module identifier. Very general names like
1243`u_charDirection()` do not have a module identifier in their prefix.
1244
1245* For POSIX replacements, the (all lowercase) POSIX function names start with
1246  "u_": `u_strlen()`.
1247* For other API functions, a 'u' is appended to the beginning with the module
1248  identifier (if appropriate), and an underscore '_', followed by the
1249  **mixed-case** function name. For example, use `u_charDirection()`,
1250  `ubidi_setPara()`.
1251* For types (struct, enum, union), a "U" is appended to the beginning, often
1252  "`U<module identifier>`" directly to the typename, without an underscore. For
1253  example, use `UComparisonResult`.
1254* For #defined constants and macros, a "U_" is appended to the beginning,
1255  often "`U<module identifier>_`" with an underscore to the uppercase macro
1256  name. For example, use `U_ZERO_ERROR`, `U_SUCCESS()`. For example, `UNORM_NFC`
1257
1258#### Functions for Constructors and Destructors
1259
1260Functions that roughly compare to constructors and destructors are called
1261`umod_open()` and `umod_close()`. See the following example:
1262
1263```c++
1264CAPI UBiDi * U_EXPORT2
1265ubidi_open();
1266
1267CAPI UBiDi * U_EXPORT2
1268ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);
1269
1270CAPI void U_EXPORT2
1271ubidi_close(UBiDi *pBiDi);
1272```
1273
1274Each successful call to a `umod_open()` returns a pointer to an object that must
1275be released by the user/owner by calling the matching `umod_close()`.
1276
1277#### C "Service Object" Types and LocalPointer Equivalents
1278
1279For every C "service object" type (equivalent to C++ class), we want to have a
1280[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)
1281equivalent, so that C++ code calling the C API can use the specific "smart
1282pointer" to implement the "[Resource Acquisition Is Initialization
1283(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
1284idiom.
1285
1286For example, in `ubidi.h` we define the `UBiDi` "service object" type and also
1287have the following "smart pointer" definition which will call `ubidi_close()` on
1288destruction:
1289
1290```c++
1291// Use config switches like this only after including unicode/utypes.h
1292// or another ICU header.
1293#if U_SHOW_CPLUSPLUS_API
1294
1295U_NAMESPACE_BEGIN
1296
1297/**
1298 * class LocalUBiDiPointer
1299 * "Smart pointer" class, closes a UBiDi via ubidi_close().
1300 * For most methods see the LocalPointerBase base class.
1301 *
1302 * @see LocalPointerBase
1303 * @see LocalPointer
1304 * @stable ICU 4.4
1305 */
1306U_DEFINE_LOCAL_OPEN_POINTER(LocalUBiDiPointer, UBiDi, ubidi_close);
1307
1308U_NAMESPACE_END
1309
1310#endif
1311```
1312
1313#### Inline Implementation Functions
1314
1315Some, but not all, C compilers allow ICU users to declare functions inline
1316(which is a C++ language feature) with various keywords. This has advantages for
1317implementations because inline functions are much safer and more easily debugged
1318than macros.
1319
1320ICU *used to* use a portable `U_INLINE` declaration macro that can be used for
1321inline functions in C. However, this was an unnecessary platform dependency.
1322
1323We have changed all code that used `U_INLINE` to C++ (.cpp) using "inline", and
1324removed the `U_INLINE` definition.
1325
1326If you find yourself constrained by .c, change it to .cpp.
1327
1328All functions that are declared inline, or are small enough that an optimizing
1329compiler might inline them even without the inline declaration, should be
1330defined (implemented) – not just declared – before they are first used. This is
1331to enable as much inlining as possible, and also to prevent compiler warnings
1332for functions that are declared inline but whose definition is not available
1333when they are called.
1334
1335#### C Equivalents for Classes with Multiple Constructors
1336
1337In cases like `BreakIterator` and `NumberFormat`, instead of having several
1338different 'open' APIs for each kind of instances, use an enum selector.
1339
1340#### Source File Names
1341
1342Source file names for C begin with a 'u'.
1343
1344#### Memory APIs Inside ICU
1345
1346For memory allocation in C implementation files for ICU, use the functions and
1347macros in `cmemory.h`. When allocated memory is returned from a C API function,
1348there must be a corresponding function (like a `ucnv_close()`) that deallocates
1349that memory.
1350
1351All memory allocations in ICU should be checked for success. In the event of a
1352failure (a `NULL` returned from `uprv_malloc()`), a `U_MEMORY_ALLOCATION_ERROR` status
1353should be returned by the ICU function in question. If the allocation failure
1354leaves the ICU service in an invalid state, such that subsequent ICU operations
1355could also fail, the situation should be flagged so that the subsequent
1356operations will fail cleanly. Under no circumstances should a memory allocation
1357failure result in a crash in ICU code, or cause incorrect results rather than a
1358clean error return from an ICU function.
1359
1360#### // Comments
1361
1362C++ style // comments may be used in plain C files and in headers that will be
1363included in C files.
1364
1365## Source Code Strings with Unicode Characters
1366
1367### `char *` strings in ICU
1368
1369| Declared type | encoding | example | Used with |
1370| --- | --- | --- | --- |
1371| `char *` | varies with platform | `"Hello"` | Most ICU API functions taking `char *` parameters. Unless otherwise noted, characters are restricted to the "Invariant" set, described below |
1372| `char *` | UTF-8 |  `u8"¡Hola!"` | Only functions that are explicitly documented as expecting UTF-8. No restrictions on the characters used. |
1373| `UChar *` | UTF-16 | `u"¡Hola!"` | All ICU functions with `UChar *` parameters |
1374| `UChar32` | Code Point value | `U'��'` | UChar32 single code point constant. |
1375| `wchar_t` | unknown | `L"Hello"` | Not used with ICU. Unknown encoding, unknown size, not portable. |
1376
1377ICU source files are UTF-8 encoded, allowing any Unicode character to appear in
1378Unicode string or character literals, without the need for escaping. But, for
1379clarity, use escapes when plain text would be confusing, e.g. for invisible
1380characters.
1381
1382For convenience, ICU4C tends to use `char *` strings in places where only
1383"invariant characters" (a portable subset of the 7-bit ASCII repertoire) are
1384used. This allows locale IDs, charset names, resource bundle item keys and
1385similar items to be easily specified as string literals in the source code. The
1386same types of strings are also stored as "invariant character" `char *` strings
1387in the ICU data files.
1388
1389ICU has hard coded mapping tables in `source/common/putil.c` to convert invariant
1390characters to and from Unicode without using a full ICU converter. These tables
1391must match the encoding of string literals in the ICU code as well as in the ICU
1392data files.
1393
1394> :point_right: **Note**: Important: ICU assumes that at least the invariant
1395characters always have the same codes as is common on platforms with the same
1396charset family (ASCII vs. EBCDIC). **ICU has not been tested on platforms where
1397this is not the case.**
1398
1399Some usage of `char *` strings in ICU assumes the system charset instead of
1400invariant characters. Such strings are only handled with the default converter
1401(See the following section). The system charset is usually a superset of the
1402invariant characters.
1403
1404The following are the ASCII and EBCDIC byte values for all of the invariant
1405characters (see also `unicode/utypes.h`):
1406
1407| Character(s) | ASCII | EBCDIC |
1408| --- | --- | --- |
1409| a..i | 61..69 | 81..89 |
1410| j..r | 6A..72 | 91..99 |
1411| s..z | 73..7A | A2..A9 |
1412| A..I | 41..49 | C1..C9 |
1413| J..R | 4A..52 | D1..D9 |
1414| S..Z | 53..5A | E2..E9 |
1415| 0..9 | 30..39 | F0..F9 |
1416| (space) | 20 | 40 |
1417| " | 22 | 7F |
1418| % | 25 | 6C |
1419| & | 26 | 50 |
1420| ' | 27 | 7D |
1421| ( | 28 | 4D |
1422| ) | 29 | 5D |
1423| \* | 2A | 5C |
1424| + | 2B | 4E |
1425| , | 2C | 6B |
1426| - | 2D | 60 |
1427| . | 2E | 4B |
1428| / | 2F | 61 |
1429| : | 3A | 7A |
1430| ; | 3B | 5E |
1431| < | 3C | 4C |
1432| = | 3D | 7E |
1433| > | 3E | 6E |
1434| ? | 3F | 6F |
1435| _ | 5F | 6D |
1436
1437### Rules Strings with Unicode Characters
1438
1439In order to include characters in source code strings that are not part of the
1440invariant subset of ASCII, one has to use character escapes. In addition, rules
1441strings for collation, etc. need to follow service-specific syntax, which means
1442that spaces and ASCII punctuation must be quoted using the following rules:
1443
1444* Single quotes delineate literal text: `a'>'b` => `a>b`
1445* Two single quotes, either between or outside of single quoted text, indicate
1446  a literal single quote:
1447  * `a''b` => `a'b`
1448  * `a'>''<'b` => `a>'<b`
1449* A backslash precedes a single literal character:
1450* Several standard mechanisms are handled by `u_unescape()` and its variants.
1451
1452> :point_right: **Note**: All of these quoting mechanisms are supported by the
1453`RuleBasedTransliterator`. The single quote mechanisms (not backslash, not
1454`u_unescape()`) are supported by the format classes. In its infancy,
1455`ResourceBundle` supported the `\uXXXX` mechanism and nothing else.
1456This quoting method is the current policy. However, there are modules within
1457the ICU services that are being updated and this quoting method might not have
1458been applied to all of the modules.
1459
1460## Java Coding Conventions Overview
1461
1462The ICU group uses the following coding guidelines to create software using the
1463ICU Java classes and methods.
1464
1465### Code style
1466
1467The standard order for modifier keywords on APIs is:
1468
1469* `public static final synchronized strictfp`
1470* `public abstract`
1471
1472Do not use wild card import, such as "`import java.util.*`". The sort order of
1473import statements is `java` / `javax` / `org` / `com`. Within each top level package
1474category, sub packages and classes are sorted by alphabetical order. We
1475recommend ICU developers to use the Eclipse IDE feature \[Source\] - \[Organize
1476Imports\] (Ctrl+Shift+O) to organize import statements.
1477
1478All if/else/for/while/do loops use braces, even if the controlled statement is a
1479single line. This is for clarity and to avoid mistakes due to bad nesting of
1480control statements, especially during maintenance.
1481
1482Tabs should not be present in source files.
1483
1484Indentation is 4 spaces.
1485
1486Make sure the code is formatted cleanly with regular indentation. Follow Java
1487style code conventions, e.g., don't put multiple statements on a single line,
1488use mixed-case identifiers for classes and methods and upper case for constants,
1489and so on.
1490
1491Java source formatting rules described above is coming with the Eclipse project
1492file. It is recommended to run \[Source\] - \[Format\] (Ctrl+Shift+F) on Eclipse
1493IDE to clean up source files if necessary.
1494
1495Use UTF-8 encoding (without BOM) for java source files.
1496
1497Javadoc should be complete and correct when code is checked in, to avoid playing
1498catch-up later during the throes of the release. Please javadoc all methods, not
1499just external APIs, since this helps with maintenance.
1500
1501### Code organization
1502
1503Avoid putting more than one top-level class in a single file. Either use
1504separate files or nested classes.
1505
1506Always define at least one constructor in a public API class. The Java compiler
1507automatically generates no-arg constructor when a class has no explicit
1508constructors. We cannot provide proper API documentations for such default
1509constructors.
1510
1511Do not mix test, tool, and runtime code in the same file. If you need some
1512access to private or package methods or data, provide public accessors for them
1513and mark them `@internal`. Test code should be placed in `com.ibm.icu.dev.test`
1514package, and tools (e.g., code that generates data, source code, or computes
1515constants) in `com.ibm.icu.dev.tool` package. Occasionally for very simple cases
1516you can leave a few lines of tool code in the main source and comment it out,
1517but maintenance is easier if you just comment the location of the tools in the
1518source and put the actual code elsewhere.
1519
1520Avoid creating new interfaces unless you know you need to mix the interface into
1521two or more classes that have separate inheritance. Interfaces are impossible to
1522modify later in a backwards-compatible way. Abstract classes, on the other hand,
1523can add new methods with default behavior. Use interfaces only if it is required
1524by the architecture, not just for expediency.
1525
1526Current releases of ICU4J (since ICU 63) are restricted to use Java SE 7 APIs
1527and language features.
1528
1529### ICU Packages
1530
1531Public APIs should be placed in `com.ibm.icu.text`, `com.ibm.icu.util`, and
1532`com.ibm.icu.lang`. For historical reasons and for easier migration from JDK
1533classes, there are also APIs in `com.ibm.icu.math` but new APIs should not be
1534added there.
1535
1536APIs used only during development, testing, or tools work should be placed in
1537`com.ibm.icu.dev`.
1538
1539A class or method which is used by public APIs (listed above) but which is not
1540itself public can be placed in different places:
1541
15421. If it is only used by one class, make it private in that class.
15432. If it is only used by one class and its subclasses, make it protected in
1544   that class. In general, also tag it `@internal` unless you are working on a
1545   class that supports user-subclassing (rare).
15463. If it is used by multiple classes in one package, make it package private
1547   (also known as default access) and mark it `@internal`.
15484. If it is used by multiple packages, make it public and place the class in
1549   `the com.ibm.icu.impl` package.
1550
1551### Error Handling and Exceptions
1552
1553Errors should be indicated by throwing exceptions, not by returning “bogus”
1554values.
1555
1556If an input parameter is in error, then a new
1557`IllegalArgumentException("description")` should be thrown.
1558
1559Exceptions should be caught only when something must be done, for example
1560special cleanup or rethrowing a different exception. If the error “should never
1561occur”, then throw a `new RuntimeException("description")` (rare). In this case,
1562a comment should be added with a justification.
1563
1564Use exception chaining: When an exception is caught and a new one created and
1565thrown (usually with additional information), the original exception should be
1566chained to the new one.
1567
1568A catch expression should not catch Throwable. Catch expressions should specify
1569the most specific subclass of Throwable that applies. If there are two concrete
1570subclasses, both should be specified in separate catch statements.
1571
1572### Binary Data Files
1573
1574ICU4J uses the same binary data files as ICU4C, in the big-endian/ASCII form.
1575The `ICUBinary` class should be used to read them.
1576
1577Some data sources (for example, compressed Jar files) do not allow the use of
1578several `InputStream` and related APIs:
1579
1580* Memory mapping is efficient, but not available for all data sources.
1581* Do not depend on `InputStream.available()`: It does not provide reliable
1582  information for some data sources. Instead, the length of the data needs to
1583  be determined from the data itself.
1584* Do not call `mark()` and `reset()` methods on `InputStream` without wrapping the
1585  `InputStream` object in a new `BufferedInputStream` object. These methods are
1586  not implemented by the `ZipInputStream` class, and their use may result in an
1587  `IOException`.
1588
1589### Compiler Warnings
1590
1591There should be no compiler warnings when building ICU4J. It is recommended to
1592develop using Eclipse, and to fix any problems that are shown in the Eclipse
1593Problems panel (below the main window).
1594
1595When a warning is not avoidable, you should add `@SuppressWarnings` annotations
1596with minimum scope.
1597
1598### Miscellaneous
1599
1600Objects should not be cast to a class in the `sun.*` packages because this would
1601cause a `SecurityException` when run under a `SecurityManager`. The exception needs
1602to be caught and default action taken, instead of propagating the exception.
1603
1604## Adding .c, .cpp and .h files to ICU
1605
1606In order to add compilable files to ICU, add them to the source code control
1607system in the appropriate folder and also to the build environment.
1608
1609To add these files, use the following steps:
1610
16111. Choose one of the ICU libraries:
1612   * The common library provides mostly low-level utilities and basic APIs that
1613     often do not make use of Locales. Examples are APIs that deal with character
1614     properties, the Locale APIs themselves, and ResourceBundle APIs.
1615   * The i18n library provides Locale-dependent and -using APIs, such as for
1616     collation and formatting, that are most useful for internationalized user
1617     input and output.
16182. Put the source code files into the folder `icu/source/library-name`, then add
1619   them to the build system:
1620   * For most platforms, add the expected .o files to
1621     `icu/source/library-name/Makefile.in`, to the OBJECTS variable. Add the
1622     **public** header files to the HEADERS variable.
1623   * For Microsoft Visual C++ 6.0, add all the source code files to
1624     `icu/source/library-name/library-name.dsp`. If you don't have Visual C++, add
1625     the filenames to the project file manually.
16263. Add test code to `icu/source/test/cintltest` for C APIs and to
1627   `icu/source/test/intltest` for C++ APIs.
16284. Make sure that the API functions are called by the test code (100% API
1629   coverage) and that at least 85% of the implementation code is exercised by
1630   the tests (>=85% code coverage).
16315. Create test code for C using the `log_err()`, `log_info()`, and `log_verbose()`
1632   APIs from `cintltst.h` (which uses `ctest.h`) and check it into the appropriate
1633   folder.
16346. In order to get your C test code called, add its top level function and a
1635   descriptive test module path to the test system by calling `addTest()`. The
1636   function that makes the call to `addTest()` ultimately must be called by
1637   `addAllTests()` in `calltest.c`. Groups of tests typically have a common
1638   `addGroup()` function that calls `addTest()` for the test functions in its
1639   group, according to the common part of the test module path.
16407. Add that test code to the build system also. Modify `Makefile.in` and the
1641   appropriate `.dsp` file (For example, the file for the library code).
1642
1643## C Test Suite Notes
1644
1645The cintltst Test Suite contains all the tests for the International Components
1646for Unicode C API. These tests may be automatically run by typing "cintltst" or
1647"cintltst -all" at the command line. This depends on the C Test Services:
1648`cintltst` or `cintltst -all`.
1649
1650### C Test Services
1651
1652The purpose of the test services is to enable the writing of tests entirely in
1653C. The services have been designed to make creating tests or converting old ones
1654as simple as possible with a minimum of services overhead. A sample test file,
1655"demo.c", is included at the end of this document. For more information
1656regarding C test services, please see the `icu4c/source/tools/ctestfw` directory.
1657
1658### Writing Test Functions
1659
1660The following shows the possible format of test functions:
1661
1662```c++
1663void some_test()
1664{
1665}
1666```
1667
1668Output from the test is accomplished with three printf-like functions:
1669
1670```c++
1671void log_err ( const char *fmt, ... );
1672void log_info ( const char *fmt, ... );
1673void log_verbose ( const char *fmt, ... );
1674```
1675
1676* `log_info()` writes to the console for informational messages.
1677* `log_verbose()` writes to the console ONLY if the VERBOSE flag is turned
1678  on (or the `-v` option to the command line). This option is useful for
1679  debugging. By default, the VERBOSE flag is turned OFF.
1680* `log_error()` can be called when a test failure is detected. The error is
1681  then logged and error count is incremented by one.
1682
1683To use the tests, link them into a hierarchical structure. The root of the
1684structure will be allocated by default.
1685
1686```c++
1687TestNode *root = NULL; /* empty */
1688addTest( &root, &some_test, "/test");
1689```
1690
1691Provide `addTest()` with the function pointer for the function that performs the
1692test as well as the absolute 'path' to the test. Paths may be up to 127 chars in
1693length and may be used to group tests.
1694
1695The calls to `addTest` must be placed in a function or a hierarchy of functions
1696(perhaps mirroring the paths). See the existing cintltst for more details.
1697
1698### Running the Tests
1699
1700A subtree may be extracted from another tree of tests for the programmatic
1701running of subtests.
1702
1703```c++
1704TestNode* sub;
1705sub = getTest(root, "/mytests");
1706```
1707
1708And a tree of tests may be run simply by:
1709
1710```c++
1711runTests( root ); /* or 'sub' */
1712```
1713
1714Similarly, `showTests()` lists out the tests. However, it is easier to use the
1715command prompt with the Usage specified below.
1716
1717### Globals
1718
1719The command line parser resets the error count and prints a summary of the
1720failed tests. But if `runTest` is called directly, for instance, it needs to be
1721managed manually. `ERROR_COUNT` contains the number of times `log_err` was
1722called. `runTests` resets the count to zero before running the tests.
1723`VERBOSITY` must be 1 to display `log_verbose()` data. Otherwise, `VERBOSITY`
1724must be set to 0 (default).
1725
1726### Building cintltst
1727
1728To compile this test suite using Microsoft Visual C++ (MSVC), follow the
1729instructions in `icu4c/source/readme.html#HowToInstall` for building the `allC`
1730workspace. This builds the libraries as well as the `cintltst` executable.
1731
1732### Executing cintltst
1733
1734To run the test suite from the command line, change the directories to
1735`icu4c/source/test/cintltst/Debug` for the debug build (or
1736`icu4c/source/test/cintltst/Release` for the release build) and then type `cintltst`.
1737
1738### cintltst Usage
1739
1740Type `cintltst -h` to view its command line parameters.
1741
1742```text
1743### Syntax:
1744### Usage: [ -l ] [ -v ] [ -verbose] [-a] [ -all] [-n]
1745 [-no_err_msg] [ -h] [ /path/to/test ]
1746### -l To get a list of test names
1747### -all To run all the test
1748### -a To run all the test(same as -all)
1749### -verbose To turn ON verbosity
1750### -v To turn ON verbosity(same as -verbose)
1751### -h To print this message
1752### -n To turn OFF printing error messages
1753### -no_err_msg (same as -n)
1754### -[/subtest] To run a subtest
1755### For example to run just the utility tests type: cintltest /tsutil)
1756### To run just the locale test type: cintltst /tsutil/loctst
1757###
1758
1759/******************** sample ctestfw test ********************
1760********* Simply link this with libctestfw or ctestfw.dll ****
1761************************* demo.c *****************************/
1762
1763#include "stdlib.h"
1764#include "ctest.h"
1765#include "stdio.h"
1766#include "string.h"
1767
1768/**
1769* Some sample dummy tests.
1770* the statics simply show how often the test is called.
1771*/
1772void mytest()
1773{
1774    static i = 0;
1775    log_info("I am a test[%d]\n", i++);
1776}
1777
1778void mytest_err()
1779{
1780    static i = 0;
1781    log_err("I am a test containing an error[%d]\n", i++);
1782    log_err("I am a test containing an error[%d]\n", i++);
1783}
1784
1785void mytest_verbose()
1786{
1787    /* will only show if verbose is on (-v) */
1788    log_verbose("I am a verbose test, blabbing about nothing at
1789all.\n");
1790}
1791
1792/**
1793* Add your tests from this function
1794*/
1795
1796void add_tests( TestNode** root )
1797{
1798    addTest(root, &mytest, "/apple/bravo" );
1799    addTest(root, &mytest, "/a/b/c/d/mytest");
1800    addTest(root, &mytest_err, "/d/e/f/h/junk");
1801    addTest(root, &mytest, "/a/b/c/d/another");
1802    addTest(root, &mytest, "/a/b/c/etest");
1803    addTest(root, &mytest_err, "/a/b/c");
1804    addTest(root, &mytest, "/bertrand/andre/damiba");
1805    addTest(root, &mytest_err, "/bertrand/andre/OJSimpson");
1806    addTest(root, &mytest, "/bertrand/andre/juice/oj");
1807    addTest(root, &mytest, "/bertrand/andre/juice/prune");
1808    addTest(root, &mytest_verbose, "/verbose");
1809
1810}
1811
1812int main(int argc, const char *argv[])
1813{
1814    TestNode *root = NULL;
1815
1816    add_tests(&root); /* address of root ptr- will be filled in */
1817
1818    /* Run the tests. An int is returned suitable for the OS status code.
1819    (0 for success, neg for parameter errors, positive for the # of
1820    failed tests) */
1821    return processArgs( root, argc, argv );
1822}
1823```
1824
1825## C++ IntlTest Test Suite Documentation
1826
1827The IntlTest suite contains all of the tests for the C++ API of International
1828Components for Unicode. These tests may be automatically run by typing `intltest`
1829at the command line. Since the verbose option prints out a considerable amount
1830of information, it is recommended that the output be redirected to a file:
1831`intltest -v > testOutput`.
1832
1833### Building IntlTest
1834
1835To compile this test suite using MSVC, follow the instructions for building the
1836`alCPP` (All C++ interfaces) workspace. This builds the libraries as well as the
1837`intltest` executable.
1838
1839### Executing IntelTest
1840
1841To run the test suite from the command line, change the directories to
1842`icu4c/source/test/intltest/Debug`, then type: `intltest -v >testOutput`. For the
1843release build, the executable will reside in the
1844`icu4c/source/test/intltest/Release` directory.
1845
1846### IntelTest Usage
1847
1848Type just `intltest -h` to see the usage:
1849
1850```text
1851### Syntax:
1852### IntlTest [-option1 -option2 ...] [testname1 testname2 ...]
1853### where options are: verbose (v), all (a), noerrormsg (n),
1854### exhaustive (e) and leaks (l).
1855### (Specify either -all (shortcut -a) or a test name).
1856### -all will run all of the tests.
1857###
1858### To get a list of the test names type: intltest LIST
1859### To run just the utility tests type: intltest utility
1860###
1861### Test names can be nested using slashes ("testA/subtest1")
1862### For example to list the utility tests type: intltest utility/LIST
1863### To run just the Locale test type: intltest utility/LocaleTest
1864###
1865### A parameter can be specified for a test by appending '@' and the value
1866### to the testname.
1867```
1868
1869## C: Testing with Fake Time
1870
1871The "Fake Time" capability allows ICU4C to be tested as if the hardware clock is
1872set to a specific time. This section documents how to use this facility.
1873Note that this facility requires the POSIX `'gettimeofday'` function to be
1874operable.
1875
1876This facility affects all ICU 'current time' calculations, including date,
1877calendar, time zone formats, and relative formats. It doesn't affect any calls
1878directly to the underlying operating system.
1879
18801. Build ICU with the **`U_DEBUG_FAKETIME`** preprocessor macro set. This can
1881   be accomplished with the following line in a file
1882   **icu/source/icudefs.local** :
1883
1884   ```shell
1885   CPPFLAGS+=-DU_DEBUG_FAKETIME
1886   ```
1887
18882. Determine the `UDate` value (the time value in milliseconds ± Midnight, Jan 1,
1889   1970 GMT) which you want to use as the target. For this sample we will use
1890   the value `28800000`, which is Midnight, Pacific Standard Time 1/1/1970.
18913. Set the environment variable `U_FAKETIME_START=28800000`
18924. Now, the first time ICU checks the current time, it will start at midnight
1893   1/1/1970 (pacific time) and roll forward. So, at the end of 10 seconds of
1894   program runtime, the clock will appear to be at 12:00:10.
18955. You can test this by running the utility '`icuinfo -m`' which will print out
1896   the 'Milliseconds since Epoch'.
18976. You can also test this by running the cintltest test
1898   `/tsformat/ccaltst/TestCalendar` in verbose mode which will print out the
1899   current time:
1900
1901   ```shell
1902   $ make check ICUINFO_OPTS=-m U_FAKETIME_START=28800000 CINTLTST_OPTS=-v
1903   /tsformat/ccaltst/TestCalendar
1904   U_DEBUG_FAKETIME was set at compile time, so the ICU clock will start at a
1905   preset value
1906   env variable U_FAKETIME_START=28800000 (28800000) for an offset of
1907   -1281957858861 ms from the current time 1281986658861
1908   PASS: The current date and time fetched is Thursday, January 1, 1970 12:00:00
1909   ```
1910
1911## C: Threading Tests
1912
1913Threading tests for ICU4C functions should be placed in under utility /
1914`MultithreadTest`, in the file `intltest/tsmthred.h` and `.cpp`. See the existing
1915tests in this file for examples.
1916
1917Tests from this location are automatically run under the [Thread
1918Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
1919(TSAN) in the ICU continuous build system. TSAN will reliably detect race
1920conditions that could possibly occur, however improbable that occurrence might
1921be normally.
1922
1923Data races are one of the most common and hardest to debug types of bugs in
1924concurrent systems. A data race occurs when two threads access the same variable
1925concurrently and at least one of the accesses is write. The C++11 standard
1926officially bans data races as undefined behavior.
1927
1928## Binary Data Formats
1929
1930ICU services rely heavily on data to perform their functions. Such data is
1931available in various more or less structured text file formats, which make it
1932easy to update and maintain. For high runtime performance, most data items are
1933pre-built into binary formats, i.e., they are parsed and processed once and then
1934stored in a format that is used directly during processing.
1935
1936Most of the data items are pre-built into binary files that are then installed
1937on a user's machine. Some data can also be built at runtime but is not
1938persistent. In the latter case, a primary object should be built once and then
1939cloned to avoid the multiple parsing, processing, and building of the same data.
1940
1941Binary data formats for ICU must be portable across platforms that share the
1942same endianness and the same charset family (ASCII vs. EBCDIC). It would be
1943possible to handle data from other platform types, but that would require
1944load-time or even runtime conversion.
1945
1946### Data Types
1947
1948Binary data items are memory-mapped, i.e., they are used as readonly, constant
1949data. Their structures must be portable according to the criteria above and
1950should be efficiently usable at runtime without building additional runtime data
1951structures.
1952
1953Most native C/C++ data types cannot be used as part of binary data formats
1954because their sizes are not fixed across compilers. For example, an int could be
195516/32/64 or even any other number of bits wide. Only types with absolutely known
1956widths and semantics must be used.
1957
1958Use for example:
1959
1960* `uint8_t`, `uint16_t`, `int32_t` etc.
1961* `UBool`: same as `int8_t`
1962* `UChar`: for 16-bit Unicode strings
1963* `UChar32`: for Unicode code points
1964* `char`: for "invariant characters", see `utypes.h`
1965
1966> :point_right: **Note**: ICU assumes that `char` is an 8-bit byte but makes no
1967assumption about its signedness.
1968
1969**Do not use** for example:
1970
1971* `short`, `int`, `long`, `unsigned int` etc.: undefined widths
1972* `float`, `double`: undefined formats
1973* `bool`: undefined width and signedness
1974* `enum`: undefined width and signedness
1975* `wchar_t`: undefined width, signedness and encoding/charset
1976
1977Each field in a binary/mappable data format must be aligned naturally. This
1978means that a field with a primitive type of size n bytes must be at an n-aligned
1979offset from the start of the data block. `UChar` must be 2-aligned, `int32_t` must
1980be 4-aligned, etc.
1981
1982It is possible to use struct types, but one must make sure that each field is
1983naturally aligned, without possible implicit field padding by the compiler —
1984assuming a reasonable compiler.
1985
1986```c++
1987// bad because i will be preceded by compiler-dependent padding
1988// for proper alignment
1989struct BadExample {
1990    UBool flag;
1991    int32_t i;
1992};
1993
1994// ok with explicitly added padding or generally conscious
1995// sequence of types
1996struct OKExample {
1997    UBool flag;
1998    uint8_t pad[3];
1999    int32_t i;
2000};
2001```
2002
2003Within the binary data, a `struct` type field must be aligned according to its
2004widest member field. The struct `OKExample` must be 4-aligned because it contains
2005an `int32_t` field. Make padding explicit via additional fields, rather than
2006letting the compiler choose optional padding.
2007
2008Another potential problem with `struct` types, especially in C++, is that some
2009compilers provide RTTI for all classes and structs, which inserts a `_vtable`
2010pointer before the first declared field. When using `struct` types with
2011binary/mappable data in C++, assert in some place in the code that `offsetof` the
2012first field is 0. For an example see the genpname tool.
2013
2014### Versioning
2015
2016ICU data files have a `UDataHeader` structure preceding the actual data. Among
2017other fields, it contains a `formatVersion` field with four parts (one `uint8_t`
2018each). It is best to use only the first (major) or first and second
2019(major/minor) fields in the runtime code to determine binary compatibility,
2020i.e., reject a data item only if its `formatVersion` contains an unrecognized
2021major (or major/minor) version number. The following parts of the version should
2022be used to indicate variations in the format that are backward compatible, or
2023carry other information.
2024
2025For example, the current `uprops.icu` file's `formatVersion` (see the genprops tool
2026and `uchar.c`/`uprops.c`) is set to indicate backward-incompatible changes with the
2027major version number, backward-compatible additions with the minor version
2028number, and shift width constants for the `UTrie` data structure in the third and
2029fourth version numbers (these could change independently of the `uprops.icu`
2030format).
2031
2032## C/C++ Debugging Hints and Tips
2033
2034### Makefile-based platforms
2035
2036* use `Makefile.local` files (override of `Makefile`), or `icudefs.local` (at the
2037  top level, override of `icudefs.mk`) to avoid the need to modify
2038  change-controlled source files with debugging information.
2039  * Example: **`CPPFLAGS+=-DUDATA_DEBUG`** in common to enable data
2040    debugging
2041  * Example: **`CINTLTST_OPTS=/tscoll`** in the cintltst directory provides
2042    arguments to the cintltest test upon make check, to only run collation
2043    tests.
2044    * intltest: `INTLTEST_OPTS`
2045    * cintltst: `CINTLTST_OPTS`
2046    * iotest: `IOTEST_OPTS`
2047    * icuinfo: `ICUINFO_OPTS`
2048    * (letest does not have an OPTS variable as of ICU 4.6.)
2049
2050### Windows/Microsoft Visual Studio
2051
2052The following addition to autoexp.dat will cause **`UnicodeString`**s to be
2053visible as strings in the debugger without expanding sub-items:
2054
2055```text
2056;; Copyright (C) 2010 IBM Corporation and Others. All Rights Reserved.
2057;; ICU Additions
2058;; Add to {VISUAL STUDIO} \Common7\Packages\Debugger\autoexp.dat
2059;;   in the [autoexpand] section just before the final [hresult] section.
2060;;
2061;; Need to change 'icu_##' to the current major+minor (so icu_46 for 4.6.1 etc)
2062
2063icu_46::UnicodeString {
2064    preview        (
2065              #if($e.fFlags & 2)   ; stackbuffer
2066               (
2067                  #(
2068                "U= '",
2069                [$e.fUnion.fStackBuffer, su],
2070                "', len=",
2071                [$e.fShortLength, u]
2072                ;[$e.fFields.fArray, su]
2073               )
2074              )
2075              #else
2076               (
2077                  #(
2078                "U* '",
2079                [$e.fUnion.fFields.fArray, su],
2080                "', len=",
2081                [$e.fShortLength, u]
2082                ;[$e.fFields.fArray, su]
2083               )
2084              )
2085            )
2086
2087    stringview    (
2088              #if($e.fFlags & 2)   ; stackbuffer
2089               (
2090                  #(
2091                "U= '",
2092                [$e.fUnion.fStackBuffer, su],
2093                "', len=",
2094                [$e.fShortLength, u]
2095                ;[$e.fFields.fArray, su]
2096               )
2097              )
2098              #else
2099               (
2100                  #(
2101                "U* '",
2102                [$e.fUnion.fFields.fArray, su],
2103                "', len=",
2104                [$e.fShortLength, u]
2105                ;[$e.fFields.fArray, su]
2106               )
2107              )
2108            )
2109
2110}
2111;;;
2112;;; End ICU Additions
2113;;;
2114```
2115