(c) M.L.Gassanenko, 1999
This paper may be distributed freely in hard copy or electronic form provided that it is not changed, and a reference to the original publication is given. Citations (and partial reproduction) are allowed, but they must not misrepresent the intent of this paper, and a reference to the whole document must be given. (The purpose of this requirement is to guard against releases of incompatible "improvements" of this specification, because this would be a hindrance to the primary purpose of this document, portability of return address manipulations.)

Document: The proposed Open Interpreter Wordset, part 1 of 2.
This file: OI-norma.doc, Second part: OI-ratio.doc
History:
Version 1.0:
Gassanenko M.L. Open Interpreter: Portability of Return Stack Manipulations. Proc. of the euroFORTH'98 Conf., Sep. 18-21 1998.
Version 2.0: Feb-May 1999. Both a paper for FD and a proposal for the ANSI/ISO Forth standard.
Version 2.1: 29 June 1999. Added
R-SAVE-SYS/R-RESTORE-SYS.
Version 2.2: 20 Oct 1999 Grammar corrections, better formulations, etc.
Version 2.3: 11 May 2002 Introduced all corrections to this file. There wsa an attempt to publish this document in FD, this should happen, but AFAIK has not happened even in the last issue of FD (there is no such magazine now). The corrections that were in separate files are now here.
Version 2.4: 04 June 2002 Added the "How to use this" section.
Fonts used: Times New Roman (12pt) - normal text, Times New Roman Italic (12pt) - stack diagram symbols, new terms, Times New Roman Bold (12pt) - titles when referenced, titles outside the OI spec., Arial (12pt) - used in index lines, Arial Bold (12pt) - titles within the OI spec., Courier New Bold (12pt) - Forth text, Times New Roman Bold Italic (12pt) - text that must not go to the standard.
Font sizes: 12pt (in headers outside OI spec.: 14pt and 18 pt), but no special meaning is assigned to font size. "Tables" may not look properly with a different font size.

The Open Interpreter Word Set
M.L.Gassanenko (Ph.D.)
mlg@forth.org

Abstract

The concept of Open Interpreter makes the techniques of changing the control flow via return stack changes achitecture-independent.
The five classes of open interpreter systems allow programmers to choose the most adequate degree of compromise between portability and convenience of programming. The Open Interpreter specification presented in this paper may be used as an additional chapter to the ANSI/ISO standard.

1. The purpose of this paper

The purpose of this paper is to introduce a specification which would allow portable use of techniques that are currently (in March 1999) outside the scope of the ANS Forth standard. They are: manipulations with return addresses, backtracking, keeping literals in threaded code, user-defined control structures (ANS Forth supports the latter in a restricted way). Such techniques as user control over code generation, dynamic code generation, de-compilation will also benefit.

The value of some of the mentioned techniques is arguable, but, in fact, sufficient motivation is provided by the two following items:
1) portability of return address manipulations (which, in particular, means portability of backtracking);
2) portability of implementation techniques (in particular, of access to literals in threaded code). Portability of implementation techniques is valuable for cross-compilers and embedded systems: people often need to port a system to a new target keeping its internals the same.

To prevent possible misinterpretation, I have to expand on the second item. It is good when implementation tools are portable. They will not be as much portable as Core words, and the structure of the standard with the Open Interpreter specification reflects this: the code that e.g. accesses in-line data requires the system to support the Core word set, plus the optional Open Interpreter word set, plus the optional Open Interpreter In-line Data Acces word set. It is up to the programmer to realize that some method is less portable than another, and to use it adequately. It is a bad style to mix low-level and application-level code, but a programming language standard cannot and must not prevent bad style.

The Open Interpreter word set has been proposed for inclusion into the standard, but first of all, the procedure requires this item be included into the technical committee (TC) agenda. It is possible that TC will not be willing to spend time on it. On the other hand, portability of the mentioned Forth techniques and inclusion of corresponding words into the standard are related, but different purposes. The proposed specification works even not being a part of the standard.

2. The approach

Let us formulate the main contradiction:
·         the "classical" architecture is backed by a wide common practice, it is both simple and adequate to the techniques of return address manipulations, but there are also "unclassical" architectures, and therefore the code written for the "classical" model is not much portable;
·         it is possible to write programs as if the return address size is unknown, the code will be portable, but cumbersome; this approach is not justified if the program will never be ported to a system with return addresses wider than one cell; in addition, doube-cell return addresses are not widely used today;
·         The compromise, "intermediate" solutions may be adequate for some architectures, but such compromises lose both advantages: they are neither backed by wide common practice nor widely portable.

The solution is to introduce multiple classes of Open Interpreter systems (namely, five). A "classical" system is of Class 1, and Class 5 is a probably Harvard system with probably multiple-cell return addresses and probably different size of code and data memory address units. A Class 1 system may be considered as a particular case of a Class 5 system.

The code written for higer classes may run on lower classes, but not vice versa. Therefore, programs written for higer classes are more portable. In exchange, programming for lower classes is less cumbersome (the word 'cumbersome' means 'inadequately complex').

An important requirement is that each lower class is a subclass of all higher classes. It guarantees that any Open Interpreter system belongs to one and only one minimal class. Otherwise it would be possible to consider the same system as a particular case of two different, incompatible architectures (each architecture implies the use of its own protocol, and the system would be able to implement either the first or the second protocol, but not both). Two portions of code written for the same system but assuming it to belong to different classes (that is, to implement different protocols) would be incompatible, which is absurd. Therefore, out of any two classes, one class must be a subclass of the other (one of the two protocols will have to include the other).

The Open Interpreter class of a system (N) is the class with the minimal ordinal number N (and maximal requirements) which the system can implement.

A system is said to be of the Open Interpreter Class
N if it complies with the requirements for systems of classes from N to 5 and does not comply to the requirements for other classes.

One more problem is related to stack manipulations:
·         if the size of a return address is greater than one cell and unknown, too many stack operators are needed to manipulate with stack items of various sizes;
·         programs that assume one-cell return addresses size are not much portable.

The solution is to use the return stack for data rearrangement. Return addresses come from the return stack and go to the return stack. In most cases, changes affect only ywo top elements. Therefore, the following set is enough:
>RR , move to the return stack;
RR> , move from the return stack;
RR@ , copy from the return stack;
RRDROP , remove from the return stack;
>RR< , exchange the data stack top with the return stack top;
>RR , move to the return stack.

3. The result

With the Open Interpreter specification, return address manipulations become portable across Open Interpreter systems.
The five classes of open interpreter systems allow programmers to choose the most adequate degree of compromise between portability and convenience of programming.

Portability of return address manipulations enables one to use the following techniques to develop portable Forth code running on a variety of platforms:
1) user-defined (application-specific) methods of code execution, including backtracking;
2) data execution (data-driven approach);
3) user-defined (application-specific) control structures, including those for the techniques mentioned above;
4) access to parameters stored in threaded code via the return stack (it is a widely used and therefore important implementation technique).

Among application areas, we should mention distrubuted artificial intelligence and cross-compilers (tools for programming for embedded systems).

4. Document organization

The document contains references to sections of the ANS Forth Standard (ANSI X3.215-1994 American National Standard for Information Systems --- Programming Languages --- Forth, American National Standards Institute, Inc., 1994; also recognized as an ISO standard), for example, "
1.3 Document organization" references the section 1.3 of the ANS Forth standard. The references to sections of the Open Interpreter word set specification all begin with OI, and sections of this paper that do not belong to the Open Interpreter specification are not referenced.

The glossary entries are organized according to ANS Forth rules (
2.2.4 Glossary notation). The symbol ???? in the glossary entry number is used for words that do not have a sequential number assigned by the standard. The sequential numbers are a Technical Committee's prerogative. From similar considerations, the symbol OI (from Open Interpreter) is used in place of the section number.

5. How to use this specification

Do not try to maximize portability. If you use a Class 1 system, do not write code using the Class 5 names if you cannot test this code on a Class 5 system. It is very probable that despite the use of Class 5 names, your code will make assumptions that are true only on Class 1 (or some other class, but not Class 5).

While reading this paper, you will probably think that Class 1 is too narrow for you, while Class 5 word set is almost unusable. This reflects the fact that there is a compromise between elegance and portability. Do not be mean, do not try to maximize (or minimize) a single characteristic. Find what is optimal for you. (By the way, the same recommendation may be applied to the use of ANS Forth.)


OI The optional Open Interpreter word set
OI.1 Introduction

Since the very first implementation, Forth allowed access to the return addresses on the return stack. Nevertheless, it was not until the end of 1990s that the problem of portability of this technique was solved. The five classes of Open Interpreter systems allow programmers to choose the most adequate degree of compromise between portability and convenience of programming.

OI.2 Additional terms and notation
OI.2.1 Definition of terms and classes of Open Interpreter systems
OI.2.1.1 The five classes of Open Interpreter systems

Definition. There are 5 classes of open interpreter Forth systems:

Class 1. Return addresses have the same format as data addresses, the system uses threaded code which resides in data memory.

Class 2. Return addresses are 1 cell wide, but their representation on the return stack may be different from that on the data stack. Threaded code resides in data memory, and data stored into threaded code may be accessed by data memory access operators, such as @ . Both aligned and unaligned addresses may be converted to the return stack representation.

Class 3. Return addresses are 1 cell wide, their representation may be different from that of data addresses. Threaded code may reside in a separate memory, and special words may be required to access that memory. Both aligned and unaligned addresses may be converted to the return stack representation.

Class 4. Return addresses may be more than 1 cell wide, and special words may be required to access threaded code. Both aligned and unaligned addresses may be converted to the return stack representation. The size of a character is an integral multiple of the size of a code memory address unit.

Class 5. Return addresses may be more than 1 cell wide, and special words may be required to access threaded code. Conversion to the return stack representation is allowed only for compiled-token-aligned return addresses. The size of one code memory address unit may exceed the size of a character.

Each class is a subclass of the next class. (End of the definition.)

The Open Interpreter class of a system (N) is the class with the minimal ordinal number N (and maximal requirements) which the system can implement. (A system is said to be of the Open Interpreter Class N if it complies with the requirements for systems of classes from N to 5 and does not comply to the requirements for other classes.)

OI.2.1.2 Definition of terms

aligned code pointer: a code memory address at which a compiled token or a reference may be located.

cell-aligned code pointer: a code memory address at which a data cell may be located. Required to be the same as aligned code pointer.

code interpreter: the interpreter that processes threaded code, as specified in OI.3.4 The executable code and the code interpreter.

code memory address unit: the size of a code memory address unit may be different from that of a data memory address unit. See: address unit in 2.1 Definition of terms.

code pointer: the address of a threaded code element (or, which is the same, the address of the threaded code fragment starting from that threaded code element).

compiled token: a threaded code element that denotes execution semantics of some procedure. When a compiled token is processed by the code interpreter, the corresponding execution semantics are performed. Different compiled tokens may have different sizes, but the ones generated by the word TOKEN, all have the same system-defined size.

current code fragment: The code fragment whose compilation has been started but not yet ended.

high-level definition: a definition created by the word : (colon) or by the CREATE...DOES> construct. The execution semantics of a high-level definition are implemented using threaded code.

in-line data (stored into threaded code): data stored into threaded code. The procedure whose compilation token precedes in-line data is responsible for processing these data. The procedure must also prevent processing of the in-line data by the code interpreter, for example, by advancing IP to the compiled token next to the data.

interpretation pointer (IP): the pointer to the next compiled token to be processed by the code interpreter. More precisely, the interpreter fetches the compiled token at IP, then advances IP to the next threaded code element, then executes the semantics denoted by the compiled token. See OI.3.4 The executable code and the code interpreter.

interpretation stack: the stack formed by IP (the top) and the return stack (the rest). The interpretation stack contains (1) code pointers that reflect the currently unfinished procedure calls, and (2) data that procedures place onto the return stack. The top interpretation stack element (IP) is always a code pointer.

IP: see interpretation pointer.

reference (to a threaded code fragment): a threaded code element that identifies the location of another threaded code element (and of the threaded code fragment starting from that element). The format of threaded code references is implementation-defined. This format may be used to represent destination locations of control transfers.

return address: a code pointer which usually either a) is the run-time nesting information generated by the threaded code interpreter when a high-level definition is called; b) may be placed onto the return stack to let the code interpreter execute a code fragment; c) (rarely) a code pointer which is, or could be, used as, or instead of a return address (in the sense of the a) and b) items).

threaded code: a) a sequence of threaded code elements; b) the representation of a program in the form of sequences of threaded code elements.

threaded code element: either a compiled token, a reference to a threaded code fragment, or in-line data.

threaded code fragment:
a sequence of threaded code elements.

threaded code interpreter: the same as code interpreter.

unaligned code pointer: a code memory address, at which an in-line data element may be located. A compiled token and a reference may be located only at compiled-token-aligned addresses (aligned code pointers).

OI.2.2 Notation
OI.2.2.1 Interpretation stack notation

The interpretation stack notation is:
         ( I:
before -- after )
The symbol "I:" is the interpretation stack
stack-id. See 2.2.2 Stack notation.

Advancing IP to the next compiled token (see
OI.3.4 The executable code and the code interpreter) is attributed to the threaded code interpreter and therefore is not included into the interpretation stack effect.

OI.2.2.2 Stored data notation

cp[ <data> ]              a code pointer cp, at which <data> are stored
cp+                        the code pointer cp advanced by the size of data stored at cp
addr[ <data> ]  
address addr at which <data> are stored

OI.3 Additional usage requirements

A system that provides either the Open Interpreter In-Line Data Access word set or the Open Interpreter Threaded Code Access word set shall provide the Open Interpreter word set.

OI.3.1 Data types

Append table OI.1 to table 3.1. Two different formats may be used to keep code pointers on the data stack and on the return stack. The data stack format is suitable for the read (or read/write) access to the code memory; the return stack format is suitable for the code interpreter.

Table OI.1 - Data Types
==============================================================
Symbol   Data type                                                     Size on stack
---------------------------------------------------------------------------------------------
acp-r    aligned code pointer (1)                          depends on the system's class (3)
acp-s    aligned code pointer (2)                          depends on the system's class (3)
ucp-r    unaligned code pointer (1)                        depends on the system's class (3)
ucp-s    unaligned code pointer (2)                        depends on the system's class (3)
acp      aligned code pointer (4,5)                        depends on the system's class (3)
ucp      unaligned code pointer (4,5)                      depends on the system's class (3)
cp       code pointer (6,5)                        depends on the system's class (3)
ct       compiled token                    none (size in code is implementation-defined)
ref      reference                                  none (size in code is implementation-defined)
l*x (7)  any data type                                                                  0 or more cells
==============================================================

(1) in the return stack representation
(2) in the data stack representation

(3) 1 cell (Classes 1-3); implementation-defined (Classes 4,5).
(4) the symbols
ucp and acp denote, correspondingly, the types ucp-s and acp-s on the data stack diagrams and the data types ucp-r and acp-r on the return stack diagrams.
(5) When this symbol appears in both return stack and data stack diagrams suffixed with the same digit, it denotes the same value in the two representations. For example, the notation "(
cp -- ) ( R: -- cp ) Move cp from the data stack to the return stack" means for Classes 1-4 "( ucp-s -- ) ( R: -- ucp-r ) Convert ucp-s to the return stack representation ucp-r, remove ucp-s from the data stack and place ucp-r onto the return stack".
(6) the symbol cp denotes the data type ucp for Classes 1-4 and the data type acp for Class 5.
(7) Like
i*x, j*x, k*x, it may be an undetermined number of stack entries of unspecified type. See table 3.1.

OI.3.2 Data type relationships

The data type relationships for systems of different classes are given in table OI.2. The phrase "=>
j" in the row corresponding to data type i denotes "i is a subtype of j", the phrase "= j" denotes "i is the same data type as j". The notation S: i indicates that the row describes the meaning of the data type symbol i on data stack diagrams; analogously, the notation R: i is used to describe the meaning of i on return stack diagrams.

Table OI.2 - Data Type Relationships
==============================================================
Open              Class 1           Class 2           Class 3           Class 4           Class 5
Interpreter      data type        data type        data type        data type        data type
data type
---------------------------------------------------------------------------------------------
ucp-r             =addr             =>x               =>x               unspecified      not exists
acp-r             =a-addr           =>ucp-r           =>ucp-r           =>ucp-r           unspecified
ucp-s             =addr             =>addr            =>u               =>i*x             =>i*x
acp-s             =a-addr           =>a-addr         =>ucp-s           =>ucp-s           =>ucp-s
---------------------------------------------------------------------------------------------
R:
ucp            =ucp-r            =ucp-r            =ucp-r            =ucp-r            not exists
R: acp            =acp-r            =acp-r            =acp-r            =acp-r            =acp-r
S: ucp            =ucp-s            =ucp-s            =ucp-s            =ucp-s            =ucp-s
S: acp            =acp-s            =acp-s            =acp-s            =acp-s            =acp-s
R: cp             =ucp-r            =ucp-r            =ucp-r            =ucp-r            =acp-r
S: cp             =ucp-s            =ucp-s            =ucp-s            =ucp-s            =acp-s
==============================================================

See:
A.OI.3.2 Data type relationships.

OI.3.3 Threaded code memory addresses

A code memory address identifies a location in the code memory space with a size of one code memory address unit, which a program may fetch from or store into or transfer control to except for the restrictions established in this Standard. The size of a code memory address unit is specified in bits. Each distinct code memory address value identifies exactly one such storage element.

The set of character-aligned code memory addresses, addresses at which a character can be accessed, is an implementation-defined subset of all code memory addresses. Adding the size of a character to a character-aligned code memory address shall produce another character-aligned code memory address.

The set of compiled-token-aligned (aligned) code memory addresses, addresses at which a compiled token or a reference can be accessed, is an implementation-defined subset of all code memory addresses. Adding the size of a reference or of a compiled token to a compiled-token-aligned address shall produce another compiled-token-aligned address. Code memory addresses (compiled-token-aligned, unaligned) are also called
code pointers (aligned, unaligned).

The set of cell-aligned code memory addresses is an implementation-defined subset of character-aligned code memory addresses. The set of cell-aligned code memory addresses is the same as the set of compiled-token-aligned code memory addresses. Adding the size of a cell to a cell-aligned code memory address shall produce another cell-aligned code memory address.

Two representations are used for code pointers: the data stack format and the return stack one (for Class 1 systems they are the same). The return stack representation is the one used by the code interpreter, this format allows to execute code. The data stack representation permits address arithmetics and access to threaded code elements.

The code memory address units do not necessarily have the same size as data space address units. The size of a cell in data space address units may be different from the size of a cell in code memory address units.

The size of a reference and the size of a compiled token shall be integral multiples of the size of a code memory address unit.

OI.3.4   The executable code and the code interpreter

The executable code used by the Forth code interpreter is called
threaded code. Threaded code is a sequence of threaded code elements, each one may be either:
·         a compiled token of a procedure (that is, of a definition)
·         a reference to threaded code (branch destinations are represented in this format)
·         in-line data
only compiled tokens of procedures are processed by the threaded code interpreter, the other two types of threaded code elements are processed by procedures. The procedure compiled immediately before in-line data and/or reference(s) shall modify IP to point to a valid compiled token, to prevent the code interpreter from accessing them.

The threaded code interpreter (the "inner" interpreter of Forth) has:
·         a register (IP, the interpretation pointer) that points to the next threaded code element to be processed, and
·         a stack (the return stack), to which the interpreter saves IP when it calls a threaded code fragment, and from which it loads IP exiting the threaded code fragment.
Together, IP and the return stack form the interpretation stack.

The threaded code interpreter repeats the following steps: fetches the compiled token at IP, then advances IP to the next threaded code element, then executes the semantics denoted by the compiled token. The semantics may imply changing IP. See
OI.6.1.0450 : , OI.6.1.0460 ; , OI.6.1.1250 DOES> , OI.6.1.1380 EXIT .

The interpretation stack can contain:
·         code pointers that reflect the currently unfinished procedure calls, and
·         data that procedures place onto the return stack.
The top interpretation stack element (IP) is always a code pointer.

Programs written for Open Interpreter Forth are allowed to change the number and order of interpretation stack elements. Programs written for Open Interpreter Forth are allowed to change control flow by changing the interpretation stack.

Programs are allowed to place data which are not threaded code fragment addresses onto the return stack, but these programs shall be written so that such data are never loaded into IP.

OI.3.5 Environmental queries

Append table OI.3 to table 3.5.

See:
3.2.6 Environmental queries

Table OI.3 - Environmental Query Strings
===============================================================
String            Value data type         Constant?        Meaning
----------------------------------------------------------------------------------------------
OPEN-INTERP      flag     no                Open Interpreter word set present
OPEN-INTERP-EXT  flag     no                Open Interpreter extensions word set present
OI-DATA           flag    no                Open Interpreter in-line data access word set
                                                                        present
OI-DATA-EXT      flag     no                Open Interpreter in-line data access
                                                                        extensions word set present
OI-CODE          flag     no                Open Interpreter threaded code access word
                                                                        set present
OI-CODE-EXT      flag     no                Open Interpreter threaded code access
                                                                        extensions word set present
===============================================================

OI.4 Additional documentation requirements
OI.4.1 System documentation
OI.4.1.1 Implementation-defined options

         - class of the system;
         - size and format of code pointers on the data stack and on the return stack;
         - whether code space is a part of the data space, whether code is in a separate memory space;
         - the algorithms that convert return addresses from the data stack representation to the return stack one and vice versa;
         - alignment requirements for threaded code elements;
         - whether unaligned addresses may be correctly converted to the return stack representation;
         - whether writing to code space is possible at run-time;
         - environmental restrictions (if any) and additional disciplines they impose.

OI.4.1.2 Ambiguous conditions

         - Loading IP with a value which is not a valid compiled token address in the return stack representation;
         - compiling a word (adding corresponding semantics to the current definition) when the code memory pointer is not compiled-token-aligned;
         - writing to code space at run-time;
         - converting an unaligned code pointer to the return stack representation (Class 5 only);
         - an unaligned code pointer is used where an aligned code pointer is required.

The following specific ambiguous conditions are noted in the glossary entries of the relevant words:

         - the value passed to
OI.6.3.???? RP! does not correspond to any valid return stack depth;
         -
OI.6.3.???? RP! removes data that control nesting structures from the return stack, and the program does not restore such data (see: OI.6.2.???? R-SAVE-SYS, OI.6.2.???? R-RESTORE-SYS);
         - an exception frame is removed by
OI.6.3.???? RP!;
         - word not defined via
6.1.1000 CREATE (OI.6.1.1250 DOES>);
         -
xt passed to OI.6.3.???? >TCODE does not correspond to a colon definition;
         - the destination address is unreachable (
OI.6.3.???? REF!);
         -
ct has not been stored with TOKEN, or TOKEN! (OI.6.3.???? TOKEN@, OI.6.3.???? TOKEN+, OI.6.3.???? TOKEN>);
         -
the threaded code space pointer is not compiled token-aligned when OI.6.5.???? /, begins execution;
         - on Class 5 systems, the code memory has not been allocated as a single cell (
OI.6.5.???? /@);
         - code memory address is not character-aligned (
OI.6.5.???? /C!, OI.6.5.???? /C@);
         - on Class 5 systems, the code memory at
ucp has not been allocated as a single character (OI.6.5.???? /C@).

OI.4.1.3 Other system documentation

         - the structure of executable code;
         - how control structures are implemented;
         - environmental restrictions, if any, and programming disciplines required in this connection

OI.4.2 Program documentation

         - the class of Open Interpreter required by the program
         - whether program writes to code memory at run-time
         - (optional) environmental restrictions which the system that runs the program is allowed to have.

OI.5 Compliance and labeling

Through the section
OI.5, the symbol wordset-name denotes one of the following word sets: the Open Interpreter word set, the Open Interpreter Threaded Code Access word set, the Open Interpreter In-Line Data Access word set; the symbol N denotes the Open Interpreter class number of the system.

OI.5.1 ANS Forth systems

The phrase "Providing the
wordset-name word set (specification ver.2.0, proposed in <this publication>)" shall be appended to the label of any Standard System that provides all of the wordset-name word set.

The phrase "Providing
name(s) from the wordset-name extension word set (specification ver.2.0, proposed in <this publication>)" shall be appended to the label of any Standard System that provides portions of the wordset-name extension word set.

The phrase "Providing the
wordset-name extension word set (specification ver.2.0, proposed in <this publication>)" shall be appended to the label of any Standard System that provides all of the wordset-name and wordset-name extension word set.

The phrase "Providing the
wordset-name word set with environmental restrictions (specification ver.2.0, proposed in <this publication>)", or "Providing name(s) from the wordset-name extension word set with environmental restrictions (specification ver.2.0, proposed in <this publication>)", or "Providing the wordset-name extension word set with environmental restrictions (specification ver.2.0, proposed in <this publication>)" shall be appended to the label of any Standard System that provides names from the wordset-name [extension] word set, but imposes additional restrictions on their use.

The phrase "of Open Interpreter Class
N (specification ver. 2.0, proposed in <this publication>)" shall be appended to the label of any Standard System providing the Open Interpreter word set to indicate its Open Interpreter class.

OI.5.2 ANS Forth programs

The phrase "Requiring Open Interpreter Class
N (specification ver. 2.0, proposed in <this publication>)" shall be appended to the label of Standard Programs that assume the system to have the Open Interpreter class not higer than N.

The phrase "Requiring the
wordset-name word set (specification ver. 2.0, proposed in <this publication>)" shall be appended to the label of Standard Programs that require the system to provide the wordset-name word set.

The phrase "Requiring
name(s) from the wordset-name Extension word set (specification ver. 2.0, proposed in <this publication>)" shall be appended to the label of Standard Programs that require the system to provide portions of the wordset-name Extension word set.

The phrase "Requiring the
wordset-name Extensions word sets (specification ver. 2.0, proposed in <this publication>)" shall be appended to the label of Standard Programs that require the system to provide all of the wordset-name and wordset-name Extensions word sets.

OI.6 Glossary
OI.6.1 The Open Interpreter words

OI.6.1.0450 :                      "colon"                    OI

Replace the specification 6.1.0450 : with the following one:

( C: "<
spaces>name" -- colon-sys )

Skip leading space delimiters. Parse name delimited by a space. Create a definition for
name, called a "colon definition". Enter compilation state and start the current definition, producing colon-sys.

The execution semantics of name will be determined by the words compiled into the body of the definition. The current definition shall not be findable in the dictionary until it is ended (or until the execution of
DOES> in some systems). The code space pointer is aligned when : finishes execution.

name Initiation: ( -- ) ( I : cp1 -- cp1 acp2 )

Push the current value of IP onto the return stack and load IP with
acp2, the address of the threaded code fragment in the name's body, thus transferring control to the body of the definition.

name Execution: ( i*x -- j*x ) ( I: k*x cp1 -- l*x acp3 )

Perform the initiation semantics of
name. The rest of execution semantics, and the stack effects are due to the words compiled into the body of the definition. A compiled token must be located at the code memory address acp3. The symbols i*x and j*x represent arguments to and results from name, respectively. The symbols k*x and l*x represent changes on the return stack.

Note. If the optional Locals word set is present, the elements of the return stack are unavailable after declaration of locals. Nevertheless, after declaration of locals the top return stack element shall be an address to which
EXIT may transfer control.

See:
6.1.0450 :, A.OI.6.1.0450 :, RFI 0005 Initiation semantics.

OI.6.1.0460 ;                      "semicolon"                        OI

Replace the specification
6.1.0460 ; with the following one:

Interpretation: Interpretation semantics for this word are undefined.

Compilation: ( C:
colon-sys -- )

Append the run-time semantics below to the current definition. End the current definition, allow it to be found in the dictionary and enter interpretation state, consuming
colon-sys. If the data-space pointer is not aligned, reserve enough data space to align it.

Run-time: ( -- ) ( I :
acp1 cp2 -- acp1 )

Transfer control to the code fragment specified by
acp1.

See:
6.1.0460 ; , A.OI.6.1.0460 ; , OI.6.1.0450 : , OI.6.1.1380 EXIT .

OI.6.1.???? >RR                   "to-double-r"                      OI
         (
cp -- ) ( R: -- cp )
Move
cp from the data stack to the return stack, converting it to the return stack format. On Class 1 systems, >RR is equivalent to >R .

OI.6.1.???? >RR<                  "to-double-r-and-back"   OI
         ( cp1 -- cp2 ) ( R: cp2 -- cp1 )
Exchange cp1 at the data stack top with cp2 at the return stack top, changing their representation. For Class 1 - Class 3 systems, >RR< is equivalent to RR> SWAP >RR .

OI.6.1.1250 DOES>                 "does"                              OI

Replace the specification
6.1.1250 DOES> with the following one:

Interpretation: Interpretation semantics for this word are undefined.

Compilation: ( C:
colon-sys1 -- colon-sys2 )

Append the run-time semantics below to the current definition. Whether or not the current definition is rendered findable in the dictionary by the compilation of
DOES> is implementation defined. Consume colon-sys1 and produce colon-sys2. Append the initiation semantics given below to the current definition.

Run-time: ( -- ) ( I:
acp1 cp2 -- acp1 )

Replace the execution semantics of the most recent definition, referred to as
name, with the name execution semantics given below. Transfer (return) control to the (calling) threaded code fragment specified by acp1. An ambiguous condition exists if name was not defined with CREATE or a user-defined word that calls CREATE .

name Initiation: ( -- a-addr ) ( I : cp3 -- cp3 acp4 )

Place
name's data field address on the stack. Push the current value of IP onto the return stack and load IP with acp4, the address of the threaded code fragment that follows DOES> which modified name, thus transferring control to the DOES> part of that definition.

name Execution: ( i*x -- j*x ) ( I: k*x cp3 -- l*x acp5 )

Perform the initiation semantics of
name. The rest of execution semantics, and the stack effects are due to the words compiled after the DOES> which modified name. At the code memory address acp5 a compiled token must be located. The symbols i*x and j*x represent arguments to and results from name, respectively. The symbols k*x and l*x represent changes on the return stack.

See:
A.6.1.1250 DOES> , OI.6.1.0450 :, A.OI.6.1.0450 : , RFI 0003 Defining words etc., RFI 0005 Initiation semantics.

OI.6.1.1370 EXECUTE CORE
         ( i*x xt -- j*x ) ( I: k*x -- l*x )
Remove
xt from the stack and perform the semantics identified by it. Other stack effects are due to the word EXECUTEd. The stack effect of the executed word is assumed to be:
         ( i*x -- j*x ) ( I: k*x -- l*x )
See:
6.1.1370 EXECUTE, OI.6.1.???? RUSH.

OI.6.1.1380 EXIT                                                               OI
         ( I:
acp1 cp2 -- acp1 )
Replace the specification
6.1.1380 EXIT with the following one:

Transfer control to the code fragment specified by acp1.

OI.6.1.???? RR>                            "double-r-from"           OI
        
( -- cp ) ( R: cp -- )
Move
cp from the return stack to the data stack, converting it to the data stack format. On Class 1 systems, RR> is equivalent to R> .

OI.6.1.???? RR@                   "double-r-fetch"                  OI
        
( -- cp ) ( R: cp -- cp )
Copy
cp from the return stack top to the data stack, converting it to the data stack format for code pointers. On Class 1 systems, RR@ is equivalent to R@ .

OI.6.1.???? RRDROP                "double-r-drop"           OI
        
( -- ) ( R: cp -- )
Remove
cp from the return stack. On Class 1 - Class 3 systems, RRDROP is equivalent to R> DROP .

OI.6.1.???? RUSH                                                               OI
        
( i*x xt -- j*x ) ( I: k*x cp1 -- l*x )
Remove the top interpretation stack element
cp1, and then execute xt, that is, remove xt from the stack and perform the semantics identified by it, as with EXECUTE . Other stack effects are due to the word executed. The stack effect of executed xt is assumed to be:
         ( i*x -- j*x ) ( I: k*x -- l*x )

See
OI.6.1.1380 EXIT, 6.1.1370 EXECUTE, OI.6.1.1370 EXECUTE, A.OI.6.1.???? RUSH.

OI.6.2 The open interpreter extension words

OI.6.2.???? R-RESTORE-SYS        "r-restore-sys"                   OI-EXT
         ( --
) ( R: xn ... x1 n -- )
Restore implementation-dependent data xn ... x1 about enclosing structures.

See:
A.OI.6.2.???? R-SAVE-SYS, OI.6.2.???? R-SAVE-SYS, OI.6.2.???? RP! .

OI.6.2.???? R-SAVE-SYS   "r-save-sys"                      OI-EXT
         ( --
) ( R: -- xn ... x1 n )
Save implementation-dependent data on the return stack. These data contain information about enclosing structures which (information) may be lost when a non-local exit is performed with the help of RP!. This information about enclosing structures (more precisely, the system copy of this information) does not change when a threaded code fragment is called or exited, or when values are placed onto or removed from the return stack.

See:
A.OI.6.2.???? R-SAVE-SYS, OI.6.2.???? R-RESTORE-SYS, OI.6.2.???? RP!, OI.6.2.???? RP@.

OI.6.2.???? COPY>RR               "copy-to-double-r"                OI
         (
cp -- cp ) ( R: -- cp )
Copy
cp from the data stack to the return stack, converting the copy to the return stack format. For Class1 - Class3 systems, COPY>RR is equivalent to DUP >RR .

OI.6.2.???? RADDR@                "r-addr-fetch"           OI-EXT
        
( a-addr -- cp )
Fetch the code pointer
cp stored at a-addr. For systems of Classes 1-3 this word is equivalent to @ .

OI.6.2.???? RADDR!                "r-addr-store"                     OI-EXT
        
( cp a-addr -- )
Store the return address
cp at a-addr. For systems of Classes 1-3 this word is equivalent to ! .

OI.6.2.???? RADDR+                "r-addr-plus"                      OI-EXT
        
( addr1 -- addr2 )
Add the size in address units of a code pointer to
addr1, giving addr2. For systems of Classes 1-3 this word is equivalent to CELL+ .

OI.6.2.???? RADDR-                "r-addr-minus"            OI-EXT
         (
addr1 -- addr2 )
Subtract the size in address units of a code pointer from
addr1, giving addr2. For systems of Classes 1-3 this word is equivalent to the phrase 1 CELLS - .

OI.6.2.????
RP@                            "r-p-fetch"                        OI-EXT
        
( -- x )
Return a system-dependent value identifying the current depth of the return stack. A Standard program may pass this value to
OI.6.2.???? RP! or compare for equality to another such value.

OI.6.2.???? RP!                            "r-p-store"                        OI-EXT
        
( x1 -- ) ( R: i*x -- j*x )
Set the return stack depth to be the one specified by
x1. If the new stack depth is greater than the old stack depth, the contents of the newly allocated return stack elements are undefined. An ambiguous condition exists if x1 does not correspond to any valid return stack depth. An ambiguous condition exists if the return stack contains data that control nesting structures and the program does not restore such data. An ambiguous condition exists if an exception frame is removed by RP!.

See:
OI.6.2.???? R-SAVE-SYS, A.OI.6.2.???? R-SAVE-SYS, OI.6.2.???? R-RESTORE-SYS, OI.6.2.???? RP@.

OI.6.3 The Open Interpreter threaded code access words

OI.6.3.???? /ALLOT                "slash-allot"                      OI-CODE
         ( n -- )
Calculate
m, the amount of code memory address units enough to store n data memory address units. If m is greater than zero, reserve m code memory address units. If m is less than zero, release |m| address units of code space. If m is zero, leave the code-space pointer unchanged. If the code-space pointer is aligned and n is a multiple of the size of a compiled token or of a reference when /ALLOT begins execution, it will remain aligned when /ALLOT finishes execution.

See
OI.6.5.???? /ALLOT .

OI.6.3.???? /HERE                 "slash-here"                       OI-CODE
         ( -- ucp )
ucp is the code memory space pointer.

OI.6.3.???? >TCODE                "to-t-code"                        OI-CODE
         ( xt -- acp )
Return the address
acp of the threaded code fragment which is called when the colon definition identified by xt is executed. An ambiguous condition exists if xt does not correspond to a colon definition.

OI.6.3.????
REF!                  "ref-store"                        OI-CODE
         ( acp1 acp2 -- )
Store a reference to
acp1 at acp2. After execution of this word, the reference at acp2 points to acp1. The size of the modified code memory area may be calculated with the phrase 1 REFS . An ambiguous condition exists if the destination address is unreachable. The address at which the reference is located and the address that follows it shall be always reachable.

OI.6.3.???? REF+                  "ref-plus"                         OI-CODE
        
( acp1[ ref.acp2 ] -- acp1+ )
Advance
acp1 by the size of a reference.

OI.6.3.???? REF-                  "ref-minus"                        OI-CODE
        
( acp1 -- acp2 )
Decrease
acp1 by the size of a reference.

OI.6.3.???? REF@                 "ref-fetch"                        OI-CODE
        
( acp1[ ref.acp2 ] -- acp2 )
Return
acp2 - the address to which the reference at acp1 points.

OI.6.3.???? REFS                                                     OI-CODE
        
( n1 -- n2 )
n2 is the size in data space address units of n1 references.

OI.6.3.???? TOKEN!               "token-store"                      OI-CODE
        
( xt acp -- )
Store a compiled token of the procedure identified by
xt to the threaded code element located at acp. The compiled token may be retrieved by the word TOKEN@ or executed with the code interpreter. The size of the modified code memory area may be calculated with the phrase 1 TOKENS .

OI.6.3.???? TOKEN,               "token-comma"             OI-CODE
        
( xt -- )
Add a compiled token of the procedure identified by
xt to the current threaded code fragment. The compiled token may be executed with the code interpreter, or retrieved with the word TOKEN@ , or changed with the word TOKEN! . The size of the added compiled token may be calculated by the phrase 1 TOKENS .

OI.6.3.???? TOKEN@                "token-fetch"                      OI-CODE
        
( acp[ ct ] -- xt )
Decode the compiled token
ct at acp and return the execution token xt of the procedure which semantics (compilation token ct) is stored at acp. An ambiguous condition exists if ct has not been stored there with TOKEN, or TOKEN! .

OI.6.3.???? TOKEN+                "token-plus"                       OI-CODE
        
( acp[ ct ] -- acp+ )
Increment
acp by the size of the compiled token ct at acp, returning the address of the next threaded code element. An ambiguous condition exists if ct has not been stored at acp with TOKEN, or TOKEN! .

OI.6.3.???? TOKEN>               "token-from"                       OI-CODE
        
( acp[ ct ] -- acp+ xt )
Decode the compiled token at
acp and return the address of the next threaded code element acp+, and the execution token xt of the procedure whose compiled token ct is stored at acp. An ambiguous condition exists if ct has not been stored at acp with TOKEN, or TOKEN!. The word TOKEN> is equivalent to the phrase >RR RR@ TOKEN+ RR> TOKEN@.

OI.6.3.???? TOKENS                                                   OI-CODE
        
( n1 -- n2 )
n2 is the size in data space address units of n1 compiled tokens allocated with the word TOKEN, .

OI.6.4 The Open Interpreter threaded code access extension words

None.

OI.6.5 The Open Interpreter in-line data access words

OI.6.5.???? /!                     "slash-store"                      OI-INLINE
        
( x acp -- )
Store one-cell data
x at acp. On Class 1 systems, this word is equivalent to the word ! .

OI.6.5.???? /+                     "slash-plus"                       OI-INLINE
        
( n ucp1 -- ucp2 )
Calculate m, the amount of code memory address units enough to store n data memory address units. Add m to ucp1. For systems of Classes 1 and 2 this word is equivalent to + .

OI.6.5.???? /,                     "slash-comma"             OI-INLINE
        
( x -- )
Reserve one cell of threaded code space and store x in the cell. If the threaded code space pointer is compiled token-aligned when
/, begins execution, it will remain compiled token-aligned when /, finishes execution. An ambiguous condition exists if the threaded code space pointer is not compiled token-aligned when /, begins execution.

OI.6.5.???? /@                     "slash-fetch"                      OI-INLINE
        
( acp[ x ] -- x )
Fetch the one-cell literal data
x located at acp. On Class 1 systems, this word is equivalent to the word @ . On Class 5 systems, an ambiguous condition exists if the code memory at ucp has not been allocated as a single cell.

OI.6.5.???? /ALIGN                "slash-align"                      OI-INLINE
         ( -- )
If the code-space pointer is not aligned, reserve enough space to align it.

OI.6.5.???? /ALIGNED              "slash-aligned"           OI-INLINE
         ( ucp -- acp )
acp is the first aligned code pointer greater than or equal to ucp.

OI.6.5.???? /ALLOT                "slash-allot"                      OI-INLINE
Calculate m, the amount of code memory address units enough to store n data memory address units. If m is greater than zero, reserve m code memory address units. If m is less than zero, release |m| address units of code space. If m is zero, leave the code-space pointer unchanged. If the code-space pointer is aligned and n is a multiple of the size of a cell when /ALLOT begins execution, it will remain aligned when /ALLOT finishes execution. If the code-space pointer is character aligned and n is a multiple of the size of a character when /ALLOT begins execution, it will remain character aligned when /ALLOT finishes execution.

See
OI.6.3.???? /ALLOT, A.OI.6.5.???? /ALLOT.

OI.6.5.???? /C!                            "slash-c-store"           OI-INLINE
        
( c ucp -- )
Store character
c at ucp. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred. On Class 1 systems, this word is equivalent to the word C! . An ambiguous condition exists if ucp is not character-aligned.

OI.6.5.???? /C@                            "slash-c-fetch"           OI-INLINE
        
( ucp[ c ] -- c )
Fetch the character literal data located at
ucp. An ambiguous condition exists if ucp is not character-aligned. For Class 1 systems, this word is equivalent to C@ . On Class 5 systems, an ambiguous condition exists if the code memory at ucp has not been allocated as a single character.

OI.6.5.???? /CELL+                "slash-cell-plus"                 OI-INLINE
        
( ucp1 -- ucp2 )
Advance ucp1 by the size of a cell. For Class1 and Class 2 systems, this word is equivalent to CELL+ .

OI.6.5.???? /C,                            "slash-c-comma"           OI-INLINE
        
( char -- )
Reserve space for one character in the threaded code space and store
char in the space.

OI.6.5.???? /GET                  "slash-get"                        OI-INLINE
        
( addr u ucp -- )
If
u is greater than 0, fill the u data space address units at addr with the contents of the corresponding amount of consecutive threaded code space address units at ucp. For systems of Classes 1 and 2, this word is equivalent to the phrase ROT ROT MOVE .

OI.6.5.????
/HERE                 "slash-here"                       OI-INLINE
See OI.6.3.???? /HERE .

OI.6.5.???? /PUT                  "slash-put"                        OI-INLINE
        
( addr u ucp -- )
Calculate m, the amount of code memory address units enough to store n data memory address units. Fill the m code memory address units with the contents of n data memory address units at addr.

OI.6.6 The Open Interpreter in-line data access extension words

OI.6.6.???? //SWAP                "double-slash-swap"               OI EXT
         ( ucp1 ucp2 -- ucp2 ucp1 )
Exchange ucp1 and ucp2 . For Class 1 - Class 3 systems, this word is equivalent to SWAP . For Class 1 - Class 4 systems, this word is equivalent to >RR >RR< RR> .

OI.6.6.???? /XSWAP                "slash-x-swap"            OI EXT
         ( ucp x -- x ucp )
Exchange ucp and x (x is at the stack top). For Class 1 - Class 3 systems, this word is equivalent to SWAP .

OI.6.6.???? X/SWAP                "x-slash-swap"            OI EXT
         ( x cp -- cp x )
Exchange x and ucp (ucp is at the stack top). For Class 1 - Class 3 systems, this word is equivalent to SWAP .


<end of normative part>
continued in the file OI-ratio.doc