This is NO page from the
dpans'94 document.
It is an experimental draft for public review, here at FORTH.SF.NET |
RFC 2002 |
TOC |
See:
www.mpeltd.demon.co.uk/arena.htm
with the original
i18n.propose.*.PDF
Forth Applications designed to run in many coutries and languages cannot yet make enough assumptions about strings and characters sets to be portable. The LOCALE word set is designed to provide words for portable internationalisation of application progams. The proposal does not attempt to cover text processing in general, but only to permit conversion of a limited set of application defined text for internationalisation.
In practice, many applications are not localised by the software developer, but by their agents in other countries. The LOCALE word set permits the software developer to provide tools that will produce text files that can be edited and converted to another language locally without dependency on computer language or operating system specific tools such as resource compilers and managers. At the same time, the proposed word set does not inhibit the use of sets of statically compiled strings for each language, it just does not define the mechanism.
The basis of the LOCALE word set is that all strings for internationalisation are compiled as LOCALE structures, and all access to these strings is through these structures. It appears that the following word set is adequate in the first place. The word set is designed to cope with character sets that are different size to the native set.
CMOVE
is locked to the DCS.
See:
locale string
See:
locale structures
SUBSTITUTE
that expands macro-references
denoted in pairs of the escape-character to their corresponding
locale string given by SET-MACRO. With the escape character being
a %, a substitute-string looks like
Your balance at %time% on %date% is %currencyvalue%.where time, date and currencyvalue are macro-references that shall be expanded. A literal % can be expressed by doubling the escape-char as %% which returns a single %.
See:
locale structures
It is recommended that the macro names and parameter sequences of ISO WD2 15435 (or later) be used where appropriate.
The implementation with a LOCALE-WORDLIST can not be specifically recommended as a SEARCH-WORDLIST would run in ACS on the target system for using lstrings in SUBSTIUTE, and in crosscompile environments the ACS may be different than the wordlists DCS on the development system.
See:
locale structures
The are a number standardisation efforts for country and language codes. Since the objective of this document is to provide for source portability of applications, we do not need to mandate numeric or string values, but only to define language and country source names that can be used as Forth words.
Assuming that text processing is mostly affected by language selection, and that formatting is heavily influenced by both country and corporate standards, we suggest that the country be dfined by the ISO3166:1998 two letter country codes (Alpha-2). For this standard an algorithm has been defined to produce unique numeric codes for each country. A set of language codes (ISO639:1998) also exists.
See:
locale structures
Since the vast majority of character sets are defined in terms of 8bit units commonly referred to as bytes or octets, it is likely that the implementation of any internationalisation code will require the presence of byte/octet access words, regardless of the underlying DCS character size.
The presence and definition of an octet/byte access mechanism is outside the scoep of this proposal.
See:
locale structures
SUBSTITUTE \ i*x addr1 len1 addr2 len2 -- addr2 len3 j*y performs the macro substition SET-MACRO \ addr len(au) c-addr u -- defines a localised string SET-ESCAPE \ locale-char -- sts the macro escape chacter
See:
17 string extensions word set.
See:
3.2.6 Environmental queries
Table 18.2 - Environmental query strings
String Value data type Constant? Meaning ------ --------------- --------- ------- LOCALE-EXT flag no locale extensions word set present
SUBSTITUTE
The phrase Providing name(s) from the LOCALE Extensions word set shall be appended to the label of any Standard System that provides portions of the LOCALE Extensions word set.
The phrase Providing the LOCALE Extensions word set shall be appended to the label of any Standard System that provides all of the Environment and LOCALE Extensions word sets.
The phrase Requiring name(s) from the LOCALE Extensions word set shall be appended to the label of Standard Programs that require the system to provide portions of the LOCALE Extensions word set.
The phrase Requiring the LOCALE Extensions word set shall be appended to the label of Standard Programs that require the system to provide all of the Environment and LOCALE Extensions word sets.
18.6.1.0001 SET-LANGUAGE
( lang -- ior) ; lang is a language code
Sets the current lnaguage for the LOCALE system. The ior is returned false if the operation succeeds, otherwise it returns a non-zero implementation-dependent ior. If the operation does not succeed, the current langauge remains unchanged.
See:
country and language constants ,
18.6.1.0002 GET-LANGUAGE
( -- lang ) ; lang is a language code
Returns the language code last set by SET-LANGUAGE
The default language is implemenation defined.
See:
country and language constants ,
18.6.1.0003 SET-COUNTRY
( country -- ior) ; country is a country code
Sets the current country for the LOCALE system. The ior is returned false if the operation succeeds, otherwise it returns a non-zero implementation-dependent ior. If the operation does not succeed, the current country remains unchanged.
See:
country and language constants ,
18.6.1.0004 GET-COUNTRY
( -- country ) ; country is a country code
Returns the country code last set by SET-COUNTRY
The default country is implemenation defined.
See:
country and language constants ,
18.6.1.0005 L"
Interpretation: Interpretation semantics for this word are undefined.
Compilation: ( "original-string<quote>" -- )
Parse DCS charachters to the final " (double-quote) and append the run-time semantics given below to the current definition.
Run-time: ( -- lsid )
Return lsid, an identifier for a locale string. Other words use slid to extract language specific information.
18.6.1.0006 LOCALE@
( lsid -- addr len@au )
Return the address and length in address units of the string (possibly converted to the current language) that corresponds to the DCS string identified by lsid character string specification for the locale string stored at c-addr. The returned c-addr is the same for these strings. u is number of locales up to the terminating null charachter, which is the length of the string.
18.6.1.0007 SUBSTITUTE
( i*x addr1 len1 addr2 len2 -- j*y addr2 len3 )
Perform macro substition on the lstring at addr1/len1 placing the result at lstring addr2/len2, returning addr2 and len3, the length of the resulting string. Ambigous conditions occur if the resulting string will not fit into addr2/len2, or macro text cannot be found, or if the lstring at addr2/len2 overlaps the lstring at addr1/len1. Macros may take parameters from the Forth data stack. (why?) (mlg: if the macros take parameters from the data stack, the order in which the parameters appear becomes important. If macros take parameters, then the grammar of the human language and the code that invokes SUBSTITUTE may require the macros be in different order. (What's worse, in inflective languages the values returned by maros may determine the word forms in the surrounding text.) )
When a macro name delimited by escape characters (see SET-ESCAPE
)
is encountered by SUBSTITUTE
, the following action occurs:
See:
SET-MACRO
SET-ESCAPE
lstrings
18.6.1.0008 SET-MACRO
( addr len c-addr u -- )
define the localised string addr/len in address units as the text to substitute for the macro of the name (in the development character set) c-addr/u. If the macro does not exist it is created. Perform macro substition on the lstring at addr1/len1 placing the result at lstring addr2/len2, returning addr2 and len3, the length of the resulting string. Ambigous conditions occur if the resulting string will not fit into addr2/len2, or macro text cannot be found, or if the lstring at addr2/len2 overlaps the lstring at addr1/len1. Macros may take parameters from the Forth data stack. (why?)
See:
SUBSTITUTE
SET-ESCAPE
lstrings
18.6.1.0009 SET-ESCAPE
( locale-char -- )
Set the macro escape character to be the localised character locale-char. By default it is the ASCII % character if it is available in the application character set. (why ACS when macro-references are in DCS?)
See:
SUBSTITUTE
SET-MACRO
lstrings
18.6.2.0010 LOCALE-INDEX
( lsid -- )
Updates the internal data structure. Useful if structures are added and changes to internal structures are required.
See:
lsid
18.6.2.0011 LOCALE-LINK
( lsid1 -- lsid2 )
Given the address of one LOCALE structure, returns the address of the next.
See:
lsid
18.6.2.0012 LOCALE-TYPE
( addr len -- )
Displays the LOCALE string whose address and length in address units are given.
See:
lstrings
18.6.1.0006 LOCALE@
18.6.2.0013 NATIVE@
( lsid -- c-addr len )
Given the LOCALE structure, returns the address and length of the corresponding DCS native string that was compiled by L"
See:
lstrings
18.6.1.0006 LOCALE@
18.6.1.0005 L"