view docs/Interpreter Operation.txt @ 125:0607e4e20702

Correct offset error for keyword table lookup
author William Astle <lost@l-w.ca>
date Sun, 07 Jan 2024 20:35:51 -0700
parents 1c1a0150fdda
children
line wrap: on
line source

This document is intended to descibe the operation of the interpreter
including program text management, parsing, and execution.

In general, LWBasic preserves the line oriented nature of Color Basic but
extends it somewhat to be more flexible and more efficient to interpret. The
primary way it does this is to split the parsing and execution processes and
to pre-parse numeric and string constants. By doing this, it removes a lot
of complexity from the interpretation loop. Parsing is done when a line is
entered into the program meaning that syntax errors can be detected
immediately instead of at run time. It also means the interpretation loop
does not have to do slow processing like finding the end of a statement.
This will be most noticeable for things like IF statements.

Parsing transforms the program into a byte code. This byte code will often
end up being larger than the original program text. The byte code consists
of a sequence of line structures which consist of a pointer to the next line
followed by a 16 bit binary line number. If the pointer is NULL, the end of
the program has been reached.

Each line consists of a sequence of "operations" which with zero or more
operands. Each of these is described below. Each section below starts with a
header line containing the operation code, which may be more than 8 bits, a
symbolic abbreviation for the operation, and an English short description.
Following that is a longer description of the operation code and the
encoding of its parameters.

It should be noted that various syntactic particles do not get encoded in
the final result even though they are keywords. The list of those is: TAB,
TO, SUB, THEN, ELSE, STEP, OFF, FN, USING, AS, ERR, ERROR, BRK, BREAK, RGB,
CMP. Thus they will appear in the keyword tables but not in the encoded
program.

Note also that some keywords serve as both commands and functions. In those
cases, there will be separate operation codes.

00 EOL End of Line

This operation signals the end of a program line. Interpretation will
continue with the next program line, or end if no further lines exist.


01 CONST0 Zero constant

Exactly what it says on the tin. This evaluates to an integer constant zero.
Because zero values are common, having a dedicated code for this is
beneficial for overal byte code compactness.

02 CONST1 One constnat

Exactly what it says on the tin. Because one is a very common constant,
encoding it specifically in a single byte seems sensible as a means to keep
the byte code size smaller.

03 INT8 8 bit signed integer constant

This is a signed 8 bit integer constant. Most constants in programs are
small integers. By encoding these specially, we keep the byte code more
compact. This saves three bytes over encoding integers at 32 bits.

04 INT16 16 bit signed integer constant

This encoding is present to avoid taking up 32 bits for the integer data
when 16 bits will do. Again, this is intended to keep the byte code a bit
more compact. This saves two bytes over encoding integers at 32 bits.

05 INT32 32 bit signed integer constant

Exactly what it says on the tin.

06 BCD48 BCD Floating Point

This is a 48 bit BCD floating point value where the first byte contains the
sign bit and 7 bit exponent (stored with a bias of 64). The remaining five
bytes contain the 10 BCD digits of the significand.

07 BCD16 BCD Floating Point (2 significant digits)
08 BCD24 BCD Floating Point (4 significant digits)
09 BCD32 BCD Floating Point (6 significant digits)

Because many numbers will only need a small number of significant digits,
encodings for numbers needing only two or four significant digits are
provided. These are intended to keep the byte code more compact.

0A STRING String constant

This encodes a string constant whose length fits in an 8 bit unsigned byte.
The first byte is the length, which may be zero, with the remaning bytes
being the string data. The string data may contain any binary values.

0B LSTRING Long string constant

This is exactly like STRING above but uses a 16 bit length field for
encoding very long strings. This will not normally occur in programs but is
included in case it is required.

1D VARS Scalar variable reference

This is a reference to a scalar variable. It is followed by a variable type
(integer, floating point, string) (upper 3 bits) and length (lower 5 bits)
byte followed by the variable name *without* a type sigil. Note that this
encoding is also used in the DIM command. Note that type 0 indicates an
unspecified type (no sigil) which will be looked up at runtime and defaults
to floating point.

1E VARA Array variable reference

This is exactly like VARS except following the variable name string, a
sequence of expressions specifying the subscript values follows. The
sequence of expressions begins with a count (8 bits) followed by the
expressions. The expression count is required to allow skipping over the
subscript references without having to know how many dimensions an array
has. Further, it is not possible to know how many dimensions are required at
parse time. Note that this encoding is also used in the DIM command. Note
that type 0 indicates an unspecified type (no sigil) which will be looked up
at runtime and defaults to floating point.

1F EXPR Expression

This indicates an expression to be evaluated. It is followed by a sequence
of terms and operators to be evaluated. The expression is stored in postfix
order and will be evaluated using an expression evaluation stack. Each
operation will fetch zero or more operands from the evaluation stack, do its
calculation, and then push its result back onto the evaluation stack. When
an "end of expression" operator is encountered, the result is popped from
the stack and left in the result destination. Note that an end of expression
operator is required because unary operators exist.

Note that an expression will be converted back to infix notation when
listed using parentheses only as required to account for operator
precedence. This means that an expression entered with parentheses may be
listed back out without parentheses.

Postfix notation is used to store expressions because it avoids having to
deal with operator precedence at run time.

20 EOE End of expression operator

This signifies the end of an expression and triggers the expression
evaluator to return its result.

21 NEG Negation
22 ADD Addition
23 SUB Subtraction
24 MUL Mulltiplication
25 DIV Division
26 MOD Modulus
27 NOT Boolean not
28 AND Boolean and
29 OR Boolean or
2A XOR Boolean exlusive or
2B COM Bitwise complement
2C LAND Bitwise and
2D LOR Bitwise or
2E LXOR Bitwise exclusive or
2F CONCAT String concatenation
30 EQ Equality comparison
31 NE Inequality comparison
32 GT Greater than comparison
33 LT Less than comparison
34 GE Greater than or equal comparison
35 LE Less than or equal comparison
36 EXP Exponentiation

These are the basic arithmetic, boolean, and logical operators.

40...7F: built in functions

40 SGN
41 INT
42 ABS
43 USR
44 RND
45 SIN
46 PEEK
47 LEN
48 STR$
49 VAL
4A ASC
4B CHR$
4C EOF
4D JOYSTK
4E LEFT$
4F RIGHT$
50 MID$
51 POINT
52 INKEY$
53 MEM
54 ATN
55 COS
56 TAN
57 EXP
58 FIX
59 LOG
5A POS
5B SQR
5C HEX$
5D VARPTR
5E INSTR
5F TIMER
60 PPOINT
61 STRING$ 
62 CVN
63 FREE
64 LOC
65 LOF
66 MKN$
67 LPEEK
68 BUTTON
69 ERNO/ERRNO
6A ERLIN/ERRLINE
6B ATTR

80...DF: commands

80 FOR
81 GOTO
82 GOSUB
83 REM
84 ' (Separate to REM because of different semantics)
85 IF
86 DATA
87 PRINT
88 ON
89 INPUT
8A END
8B NEXT
8C DIM
8D READ
8E RUN
8F RESTORE
90 RETURN
91 POP
92 STOP
93 POKE
94 CONT
95 LIST
96 CLEAR
97 NEW
98 OPEN
99 CLOSE
9A LLIST
9B SET
9C RESET
9D CLS
9E MOTOR
9F SOUND
A0 EXEC
A1 DEL
A2 EDIT
A3 TRON
A4 TROFF
A5 DEF
A6 LET
A7 LINE
A8 PCLS
A9 PSET
AA PRESET
AB SCREEN
AC PCLEAR
AD COLOR
AE CIRCLE
AF PAINT
B0 GET
B1 PUT
B2 DRAW
B3 PCOPY
B4 PMODE
B5 PLAY
B6 RENUM
B7 DIR
B8 DRIVE
B9 FIELD
BA FILES
BB KILL
BC LOAD
BD LSET
BE MERGE
BF RENAME
C0 RSET
C1 SAVE
C2 WRITE
C3 VERIFY
C4 UNLOAD
C5 DSKINI
C6 BACKUP
C7 COPY
C8 DSKI$
C9 DSKO$
CA DOS
CB WIDTH
CC PALETTE
CD LPOKE
CE LOCATE
CF ATTR