This is NO page from the dpans'94 document.
It is an experimental draft for public review, here at FORTH.SF.NET
RFC
2002

TOC


18. The optional LOCALE word set

See: www.mpeltd.demon.co.uk/arena.htm with the original i18n.propose.*.PDF


18.1 Introduction

Forth Applications designed to run in many coutries and languages cannot yet make enough assumptions about strings and characters sets to be portable. The LOCALE word set is designed to provide words for portable internationalisation of application progams. The proposal does not attempt to cover text processing in general, but only to permit conversion of a limited set of application defined text for internationalisation.

In practice, many applications are not localised by the software developer, but by their agents in other countries. The LOCALE word set permits the software developer to provide tools that will produce text files that can be edited and converted to another language locally without dependency on computer language or operating system specific tools such as resource compilers and managers. At the same time, the proposed word set does not inhibit the use of sets of statically compiled strings for each language, it just does not define the mechanism.

The basis of the LOCALE word set is that all strings for internationalisation are compiled as LOCALE structures, and all access to these strings is through these structures. It appears that the following word set is adequate in the first place. The word set is designed to cope with character sets that are different size to the native set.


18.2 Additional terms and notation

LOCALE
We use the word locale to mean the mixture of country, language, font, date/time formatting and so on in use when an application program runs.

DCS - Development Character Set
The language and character set encoding used by the Forth system at development time is referred to as the Development Character Set (DCS). The development character set is assumed never to change. It is furthermore assumed that character manipulation in the Forth System is defined in terms of the DCS, and that the action of character operations suchs as CMOVE is locked to the DCS.

See: OCS ACS


OCS - Operating Character Set
The language and character set encoding used by any underlying operating system is referred to as the Operating Character Set (OCS). The OCS may or may not be the same as the DCS

See: DCS ACS


ACS - Application Character Set
The language and character set encoding used at application run time is referred to as the Application Character Set (ACS). It is assumed that the largets character in an ACS fits in the native cell of the development Forth system. The only LOCALE word set use of individual characters is for setting macro escape characters (see later). The ACS may or may not be the same as the OCS.

See: DCS OCS


LOCALE structures / lsid
We do not wish to contrain or influence the techniques in any way. A specific string for internationalistion needs to be referred to by a single parameter, which we call the "locale string identifier", or lsid. This is an opaque type, in other words the programmer should make no asumptions about what it means, except that different strings have different lsids. In many cases, an lsid may well be an address.

See: locale string


LOCALE strings / lstrings
At application run time, locale strings need to be manipulated. Locale strings are described in the terms of address units. For brevity, locale strings are also referred to as lstrings

See: locale structures


escape character
used in SUBSTITUTE that expands macro-references denoted in pairs of the escape-character to their corresponding locale string given by SET-MACRO. With the escape character being a %, a substitute-string looks like
   Your balance at %time% on %date% is %currencyvalue%.
where time, date and currencyvalue are macro-references that shall be expanded. A literal % can be expressed by doubling the escape-char as %% which returns a single %.

See: locale structures


macro-names / references
The macro names and expansions are usually created in a seperated wordlist as it is done with the ENVIRONMENT-WORDLIST to implement ENVIRONMENT?-queries - if an entry exists it is executed, so that the system is free to create complex expansions for macro names.

It is recommended that the macro names and parameter sequences of ISO WD2 15435 (or later) be used where appropriate.

The implementation with a LOCALE-WORDLIST can not be specifically recommended as a SEARCH-WORDLIST would run in ACS on the target system for using lstrings in SUBSTIUTE, and in crosscompile environments the ACS may be different than the wordlists DCS on the development system.

See: locale structures


country and language constants

The are a number standardisation efforts for country and language codes. Since the objective of this document is to provide for source portability of applications, we do not need to mandate numeric or string values, but only to define language and country source names that can be used as Forth words.

Assuming that text processing is mostly affected by language selection, and that formatting is heavily influenced by both country and corporate standards, we suggest that the country be dfined by the ISO3166:1998 two letter country codes (Alpha-2). For this standard an algorithm has been defined to produce unique numeric codes for each country. A set of language codes (ISO639:1998) also exists.

See: locale structures


Octets and Bytes

Since the vast majority of character sets are defined in terms of 8bit units commonly referred to as bytes or octets, it is likely that the implementation of any internationalisation code will require the presence of byte/octet access words, regardless of the underlying DCS character size.

The presence and definition of an octet/byte access mechanism is outside the scoep of this proposal.

See: locale structures


18.3 Additional usage requirements


18.3.1 Basis

The following three words are defined to handle macro substitions of text, and output of application data such as date, time, currency and so on. The normative definitions appear later in this proposal.
     SUBSTITUTE     \ i*x addr1 len1 addr2 len2 -- addr2 len3 j*y
                      performs the macro substition
     SET-MACRO      \ addr len(au) c-addr u --
                      defines a localised string
     SET-ESCAPE     \ locale-char --
                      sts the macro escape chacter

See: 17 string extensions word set.


18.3.2 Environmental queries

Append table 18.2 to table 3.5.

See: 3.2.6 Environmental queries

Table 18.2 - Environmental query strings

String            Value data type  Constant?  Meaning
------            ---------------  ---------  -------
LOCALE-EXT         flag             no         locale extensions word set present


18.4 Additional documentation requirements


18.4.1 System documentation


18.4.1.1 Implementation-defined options


18.4.1.2 Ambiguous conditions

(two times invalid-lsid?)

18.4.1.3 Other system documentation


18.4.2 Program documentation


18.4.2.1 Environmental dependencies


18.4.2.2 Other program documentation


18.5 Compliance and labeling


18.5.1 STD Forth systems

The phrase Providing the LOCALE word set shall be appended to the label of any Standard System that provides all of the LOCALE word set.

The phrase Providing name(s) from the LOCALE Extensions word set shall be appended to the label of any Standard System that provides portions of the LOCALE Extensions word set.

The phrase Providing the LOCALE Extensions word set shall be appended to the label of any Standard System that provides all of the Environment and LOCALE Extensions word sets.


18.5.2 STD Forth programs

The phrase Requiring the LOCALE word set shall be appended to the label of Standard Programs that require the system to provide the LOCALE word set.

The phrase Requiring name(s) from the LOCALE Extensions word set shall be appended to the label of Standard Programs that require the system to provide portions of the LOCALE Extensions word set.

The phrase Requiring the LOCALE Extensions word set shall be appended to the label of Standard Programs that require the system to provide all of the Environment and LOCALE Extensions word sets.


18.6 Glossary


18.6.1 LOCALE words


18.6.1.0001 SET-LANGUAGE
LOCALE

        ( lang -- ior) ; lang is a language code

Sets the current lnaguage for the LOCALE system. The ior is returned false if the operation succeeds, otherwise it returns a non-zero implementation-dependent ior. If the operation does not succeed, the current langauge remains unchanged.

See: country and language constants ,


18.6.1.0002 GET-LANGUAGE
LOCALE

        ( -- lang ) ; lang is a language code

Returns the language code last set by SET-LANGUAGE The default language is implemenation defined.

See: country and language constants ,


18.6.1.0003 SET-COUNTRY
LOCALE

        ( country -- ior) ; country is a country code

Sets the current country for the LOCALE system. The ior is returned false if the operation succeeds, otherwise it returns a non-zero implementation-dependent ior. If the operation does not succeed, the current country remains unchanged.

See: country and language constants ,


18.6.1.0004 GET-COUNTRY
LOCALE

        ( -- country ) ; country is a country code

Returns the country code last set by SET-COUNTRY The default country is implemenation defined.

See: country and language constants ,


18.6.1.0005 L"
LOCALE EXT
        Interpretation: Interpretation semantics for this word are undefined.

        Compilation: ( "original-string<quote>" -- )

Parse DCS charachters to the final " (double-quote) and append the run-time semantics given below to the current definition.

        Run-time: ( -- lsid )

Return lsid, an identifier for a locale string. Other words use slid to extract language specific information.

See: DCS , lsid ,


18.6.1.0006 LOCALE@
LOCALE
        ( lsid -- addr len@au )

Return the address and length in address units of the string (possibly converted to the current language) that corresponds to the DCS string identified by lsid character string specification for the locale string stored at c-addr. The returned c-addr is the same for these strings. u is number of locales up to the terminating null charachter, which is the length of the string.

See: DCS lsid lstrings


18.6.1.0007 SUBSTITUTE
LOCALE
        ( i*x addr1 len1 addr2 len2 -- j*y addr2 len3 )

Perform macro substition on the lstring at addr1/len1 placing the result at lstring addr2/len2, returning addr2 and len3, the length of the resulting string. Ambigous conditions occur if the resulting string will not fit into addr2/len2, or macro text cannot be found, or if the lstring at addr2/len2 overlaps the lstring at addr1/len1. Macros may take parameters from the Forth data stack. (why?) (mlg: if the macros take parameters from the data stack, the order in which the parameters appear becomes important. If macros take parameters, then the grammar of the human language and the code that invokes SUBSTITUTE may require the macros be in different order. (What's worse, in inflective languages the values returned by maros may determine the word forms in the surrounding text.) )

When a macro name delimited by escape characters (see SET-ESCAPE) is encountered by SUBSTITUTE, the following action occurs:

  1. if the name is a valid macro name, a locale and implementation dependent action occurs
  2. if the name is null, a single escape character is substituted
  3. in all other cases an ambigous condition exists.
(why not leave the macro-name as is just chopping off the first or last macro escape character?)

See: SET-MACRO SET-ESCAPE lstrings


18.6.1.0008 SET-MACRO
LOCALE
        ( addr len c-addr u -- )

define the localised string addr/len in address units as the text to substitute for the macro of the name (in the development character set) c-addr/u. If the macro does not exist it is created. Perform macro substition on the lstring at addr1/len1 placing the result at lstring addr2/len2, returning addr2 and len3, the length of the resulting string. Ambigous conditions occur if the resulting string will not fit into addr2/len2, or macro text cannot be found, or if the lstring at addr2/len2 overlaps the lstring at addr1/len1. Macros may take parameters from the Forth data stack. (why?)

See: SUBSTITUTE SET-ESCAPE lstrings


18.6.1.0009 SET-ESCAPE
LOCALE
        ( locale-char -- )

Set the macro escape character to be the localised character locale-char. By default it is the ASCII % character if it is available in the application character set. (why ACS when macro-references are in DCS?)

See: SUBSTITUTE SET-MACRO lstrings


18.6.2 LOCALE extension words


18.6.2.0010 LOCALE-INDEX
LOCALE
       ( lsid -- )

Updates the internal data structure. Useful if structures are added and changes to internal structures are required.

See: lsid


18.6.2.0011 LOCALE-LINK
LOCALE
       ( lsid1 -- lsid2 )

Given the address of one LOCALE structure, returns the address of the next.

See: lsid


18.6.2.0012 LOCALE-TYPE
LOCALE
       ( addr len -- )

Displays the LOCALE string whose address and length in address units are given.

See: lstrings 18.6.1.0006 LOCALE@


18.6.2.0013 NATIVE@
LOCALE
       ( lsid -- c-addr len )

Given the LOCALE structure, returns the address and length of the corresponding DCS native string that was compiled by L"

See: lstrings 18.6.1.0006 LOCALE@ 18.6.1.0005 L"


Table of Contents
Next Section