Document: The proposed Open Interpreter Wordset, part 2 of 2 -- "Rationale"
Version: 2.3 -- 11 May 2002
Author: M.L.Gassanenko, 1999
This file: OI-ratio.doc, First part: OI-norma.doc
A.OI The optional Open Interpreter Wordset
A.OI.1 Introduction
A.OI.2 Additional terms and notation
A.OI.2.1 Definition of terms and classes of Open Interpreter systems
A.OI.2.1.1 The five classes of Open Interpreter systems
Class 5. It is possible that real Class 5 systems will not be able to support the Open Interpreter Data Access and Code Access word sets without environmental restrictions. If hardware does not permit unaligned code pointers to be stored as return addresses, the executable code memory space is most likely larger than the readable code memory space. For example, code memory may consist of 64K 16-bit words, but only the first 32K words may be accessed as 64K read-only bytes. In this situation, the full implementation of the code memory data access functionality is just not possible.
In general, it may be recommended to write new code for at least Class 3; Class 5 is probably not worth care unless there is a perspective of porting code to a Class 5 system.
A.OI.2.1.2 Definition of terms
A.OI.2.2 Notation
A.OI.2.2.1 Interpretation stack notation
A word having the return stack effect
( R: i*x -- j*x )
is assumed to have the following interpretation stack effect:
( I: i*x cp -- j*x cp )
and vice versa, only words that do not change IP, the top interpretation stack item, may be adequately described by the return stack diagram.
A.OI.2.2.2 Stored data notation
A.OI.3 Additional usage requirements
A.OI.3.1 Data types
The data type cp denotes an unaligned code pointer (ucp) for Classes 1-4, and an aligned code pointer (acp) for Class 5, because unaligned code pointers cannot be represented in the return stack representation on systems of Class 5.
The datatypes acp and ucp have two representations: the data stack one (acp-s and ucp-s, correspondingly) and the return stack one (acp-r and ucp-r). The symbol acp denotes acp-s on the data stack diagrams and acp-r on the return stack diagrams. This approach was chosen because acp-s and acp-r (ucp-s and ucp-r) are logically the same value.
A.OI.3.2 Data type relationships
For Class 1,
acp = a-addr => cp = ucp = addr.
For Class 2, the return stack representation of code pointers is different from the data stack representation.
acp-s = a-addr => cp-s = ucp-s = addr,
acp-r => cp-r = ucp-r => x.
For Class 3, the code pointers are not necessarily data memory addresses.
acp-s => cp-s = ucp-s => u,
acp-r => cp-r = ucp-r => x.
For Class 4, the code pointers are not necessarily one-cell wide.
acp-s => cp-s = ucp-s => i*x,
acp-r => cp-r = ucp-r.
A system of Class 5 is a system of Class 4 with the environmental restriction that unaligned code pointers cannot be converted to the return stack representation. This restriction affects all words that accept or return the data type cp.
cp-s = acp-s => ucp-s => i*x,
cp-r = acp-r => ucp-r.
The symbols cp-s and cp-r abovedenote cp in the data stack and return stack representations correspondingly.
A.OI.3.3 Threaded code memory addresses
The standard does not require that the size (in bits) of one code memory address unit is not greater than the size of a character, but it is possible that systems on which it is not true will not be able to implement the Open Interpreter In-Line Data Access word set in a reasonably efficient way.
A.OI.3.4 The executable code and the code interpreter
The key to understanding the return address manipulations is a dual view on the interpreter stack. The threaded code interpreter considers the return stack and the interpretation pointer (IP) as a single stack. The programmer manipulates only with the return stack, because IP changes while the programmer's code executes, and writing to IP will result in an immediate control transfer. On the other hand, this does not make a restriction: when we call an auxiliary procedure, the return stack becomes what the interpretation stack was. Any changes that have to be done with the interpretation stack, the auxiliary procedure does with the return stack. When the procedure exits, the interpretation stack becomes what the return stack was.
The rule of thumb for writing code that changes the interpretation stack is: write code that does with the return stack what must be done with the interpretation stack; put this code into an auxiliary procedure. This procedure will do the required changes with the interpretation stack.
A.OI.3.5 Environmental queries
A.OI.4 Additional documentation requirements
A.OI.4.1 System documentation
A.OI.4.1.1 Implementation-defined options
A.OI.4.2 Program documentation
A program written for a standard system with environmental restrictions can run on a standard system. A standard system provides all of the functionality of the system with environmental restrictions, plus some additional functionality.
A program written for an unstandard system cannot run on a standard system. The functionality of a standard system is just different from the functionality of the unstandard system for which the program is written.
For example, if the system does not implement the word /C@ , it is an environmental restriction. If the system allows in-line literal data only within the first 32K of code memory, it is an environmental restriction. A program aware of such peculiarities still can run on a standard system. But if the value returned by RR@ points to the called compiled token instead of the next compiled token, the system is unstandard, and a program aware of this peculiarity cannot run on a standard system
A.OI.5 Compliance and labeling
A.OI.5.1 ANS Forth systems
A.OI.5.2 ANS Forth programs
A.OI.6 Glossary
A.OI.6.1 The Open Interpreter words
A.OI.6.1.0450 :
"colon"
OI
name Execution: ( i*x -- j*x ) ( I: k*x cp1 -- l*x acp3 )
1)
The initiation semantics of name has the interpretation stack effect
( I : cp1 -- cp1 acp2 ).
The rest of execution semantics of name has the interpretation stack effect
( I: k*x cp1 acp2 -- l*x acp3 ),
thus giving
( I: k*x cp1 -- l*x acp3 ).
2)
cp1 is not necessarily aligned because the word OI.6.1.???? RUSH enables one to start a colon definition with an unaligned cp1. But if name is invoked by the threaded code interpreter, cp1 just cannot be unaligned.
3)
If name does not do return stack manipulations, its interpretation stack effect
( I: acp1 -- acp1 )
is a "sum" of the interpretation stack effects of:
name initiation
( I : acp1 -- acp1 acp2 ),
name body (IP changes while the body is being interpreted)
( I : acp1 acp2 -- acp1 acp4 ),
and EXIT (or run-time semantics of OI.6.1.0460 ;)
( I : acp1 acp4 -- acp1 ).
A.OI.6.1.???? RUSH
OI
The word RUSH allows to get rid of an extraneous return stack element. If the word X does return stack manipulations, then the interpretation stack elements are arguments to it. Execution of X from inside an auxiliary definition is different from execution of X without an auxiliary definition because one more return address makes the difference. The word RUSH allows an auxiliary definition to call X as if X was called in place of the auxiliary definition.
A.OI.6.2 The open interpreter extension words
A.OI.6.2.???? RP@
"r-p-fetch"
OI-EXT
A.OI.6.2.???? RP!
"r-p-store"
OI-EXT
The purpose of these words is to provide non-local exits (which is required, for example, for a Prolog-like cut statement). These words may be found on most (if not all) Forth systems.
The value x used by these words is traditionally called "return stack pointer".
OI.6.2.???? R-SAVE-SYS
"r-save-sys"
OI-EXT
An implementation may keep data that control nesting structures in registers. For example, it may keep in registers do-loop parameters (count, limit) and the locals frame pointer (if not locals themselves). Therefore, a program that implements non-local exits using RP! shall save such information using R-SAVE-SYS before obtaining the return stack pointer with RP@ and shall restore this information using R-RESTORE-SYS after changing the return stack pointer with RP!.
OI.6.3 The Open Interpreter threaded code access words
There is a wide class of applications that do not need dynamic code generation or run-time patching of generated code. Therefore, it may be quite resonable to introduce environmental restrictions on the use of words that write to code space, for example, requiring that these words are not used to patch finished definitions. Such system shall be labelled as "Providing the Open Interpreter threaded code access word set with environmental restrictions", and the restrictions shall be documented.
A.OI.6.3.???? TOKEN,
"token-comma"
OI-CODE
The difference between OI.6.3.???? TOKEN, and 6.2.0945 COMPILE, is that COMPILE, is allowed to do optimizations. If some word, say TUCK , is compiled with TOKEN, , the resulting compiled token is guaranteed to have the size of 1 TOKENS and be decompiled (e.g. with the word TOKEN>) as TUCK , while if the same word is compiled with COMPILE, , the compiled token may be of some different size and decompile, for example, as SWAP OVER , or may be non-decompileable.
A.OI.6.3.???? TOKEN>
"token-from"
OI-CODE
A.OI.6.3.???? TOKEN+
"token-plus"
OI-CODE
The word TOKEN+ is not necessarily equivalent to the phrase 1 TOKENS /XSWAP /+. If the code memory address at the stack top points to a token compiled with TOKEN,, they are equivalent. But if the code memory address at the stack top points to a token compiled with COMPILE,, the word TOKEN+ is allowed to add the size of that token instead of adding the standard size of a token.
A.OI.6.4 The Open Interpreter threaded code access extension words
A.OI.6.5 The Open Interpreter in-line data access words
A.OI.6.5.???? /@
"slash-fetch"
OI-INLINE
If return addresses are one-cell wide, and code memory is data memory, but alignment requirements for compiled tokens and data memory cells are different, that is, aligned code pointers are not aligned addresses, then the system can implement only Class 3.
A.OI.6.5.???? /+
"slash-plus"
OI-INLINE
A.OI.6.5.???? /ALLOT
"slash-allot"
OI-INLINE
A.OI.6.5.???? /GET
"slash-get"
OI-INLINE
A.OI.6.5.???? /PUT
"slash-put"
OI-INLINE
Code memory address units may have different size than the data memory address units, and the phrase 1 CHARS /ALLOT 1 CHARS /ALLOT may give different results than the phrase 2 CHARS /ALLOT.
Since all the words /ALLOT and /+ may perform alignment on a code memory address unit boundary, the data elements in code memory must be accessed in the same way as they have been allocated.
A.OI.6.6 The Open Interpreter in-line data access extension words
These words are meaningful only on Class 4 and Class 5 systems. On a Class 5 system, unaligned code pointers cannot be placed onto the return stack, and these are the only words that can do something with an unaligned code pointer.
<end of rationale>
<end of document>