LWOS: src/parse.s comparison

comparison src/parse.s @ 123:5681cdada362

Redo keyword table handling to handle keywords differing in length Some keywords differ only due to length. That is, the shorter keyword matches the leading characters of the longer one. Make the keyword table builder and processor handle these cases. Also re-implement the handler based on evolved understanding of its requirements.

author	William Astle <lost@l-w.ca>
date	Mon, 01 Jan 2024 15:15:45 -0700
parents	5d5472b11ccd
children	8770e6f977c3

comparison

equal deleted inserted replaced

-:5660ce96a9b7
+:5681cdada362
 parse_nexttok15 tfr y,d                         ; fetch input location
 subd parse_tokenst              ; calculate length of token
 std val0+val.strlen             ; save the length of the identifier
 ldb #token_ident                ; set token type to identifier (variable name, probably)
 rts                             ; return token type, do not advance since we already did above
-; Parsing a potential keyword here. This works using a recursive lookup table. Each lookup table starts with a 18 bit
+; This routine parses tokens using the table at parse_wordtab. The table is structured as follows:
-; size entry for the table. Each entry is then 2 bytes. The first is the character to
+;
-; match for this entry. The second is either token_eot to indicate a sub table needs to be consulted, token_ident to
+; * two bytes which contain the length of the table less the two bytes for this length value
-; indicate that the token should be parsed as an identifier, or a token type code which indicates the value should
+; * a sequence of entries consisting of a single byte matching character and a token code followed
-; be accepted. If a sub table is to be consulted, the table will appear inline with the same format. Should matching
+;   by an optional sub table, structured exactly the same way.
-; fall off the end of a table, the character being considered will be "ungot" and processing will return back up the
+;
-; call chain, ungetting characters, until the top level at which point token_ident will be returned.
+; The optional subtable will be present if the token code is token_eot
 ;
-; If the match character is negative, the match character represents the number of characters to "unget" and then
+; If the character match is negative, it means a lookahead failed. The negative value is the number
-; return the specified token. This is for handling look-aheads.
+; of characters to unget and the token code is the token value to return. No other entries after this
-parse_nexttok16 pshs a,x                        ; save input character
+; in a table will be considered since thie negative match is a global match.
-ldd ,x++                        ; get number of entries in the table
+;
-addd 1,s                        ; set pointer to end of table
+; When a token_eot match is found, if there are no further characters in the input, the match is
-std 1,s
+; determined to be invalid and processing continues with the next entry.
-parse_nexttok17 cmpa ,x++                       ; does this entry match?
+parse_wordtab0  leas 3,s                        ; clean up stack for sub table handling
-beq parse_nexttok21             ; brif so
+parse_wordtab   pshs a,x                        ; save input character and start of table
-ldb -2,x                        ; was this a look-ahead non-match?
+ldd ,x++                        ; get length of this table
-bpl parse_nexttok19             ; brif not
+addd 1,s                        ; calculate the address of the end of the table
-leay b,y                        ; back up the input pointer
+std 1,s                         ; save end address for comparison later
-ldb -1,x                        ; get match token
+lda ,s                          ; get back input character
-parse_nexttok18 puls a,x,pc                     ; clean up stack and return the matched token
+parse_wordtab1  ldb -1,x                        ; fetch token code for this entry
-parse_nexttok19 ldb -1,x                        ; is there a sub table?
+cmpa ,x++                       ; does this entry match?
-cmpb #token_eot
+bne parse_wordtab4              ; brif not
-bne parse_nexttok20             ; brif not
+cmpb #token_eot                 ; is it indicating a sub table?
-ldd ,x++                        ; move past the sub table
+bne parse_wordtab6              ; brif not
-leax d,x
+bsr parse_nextcharu             ; fetch next input character (for sub table match)
-parse_nexttok20 cmpx 1,s                        ; did we reach the end of this table?
+bne parse_wordtab0              ; brif we are going to check the sub table
-blo parse_nexttok17             ; brif not
+parse_wordtab2  ldd ,x++                        ; fetch length of sub table
-ldb #token_ident                ; flag identifier required
+leax d,x                        ; move past sub table
-puls a,x,pc                     ; restore input character, clean up stack, and return
+parse_wordtab3  lda ,s                          ; get back input character
-parse_nexttok21 ldb -1,x                        ; what token did we match?
+cmpx 1,s                        ; are we at the end of the table?
-cmpb #token_eot                 ; sub table?
+blo parse_wordtab1              ; brif not - check another entry
-bne parse_nexttok18             ; brif not - ding! ding! ding! we have a match
+comb                            ; indicate no match
-leas 3,s                        ; clean up stack
+puls a,x,pc                     ; clean up stack and return
-bsr parse_nextcharu             ; fetch next input character
+parse_wordtab4  lda -2,x                        ; get the match character
-bne parse_nexttok16             ; process sub table entries if we have input
+bmi parse_wordtab5              ; brif negative - lookahead fail
-ldb #token_ident                ; indicate we have an ident
+cmpb #token_eot                 ; is there a sub table to skip?
-leay -1,y                       ; unget the end of input
+beq parse_wordtab2              ; brif so - skip sub table
-rts
+bra parse_wordtab3              ; otherwise just move to the next entry
+parse_wordtab5  leay a,y                        ; move back the specified number of characters
+parse_wordtab6  clra                            ; clear C to indicate a match
+puls a,x,pc                     ; clean up stack and return
 parse_number    jmp parse_tokerr
 ; Relational token table, bits are > = <
 parse_reltab    fcb token_error
 fcb token_lt
 fcb token_eq
 parse_tokdef token_pop,parse_noop
 parse_tokdef token_to,parse_noop
 parse_tokdef token_and,parse_noop
 parse_tokdef token_or,parse_noop
 parse_tokdef token_go,parse_noop
+parse_tokdef token_as,parse_noop
+parse_tokdef token_asc,parse_noop
 parse_rem       rts
 *pragmapop list

Mercurial > hg > index.cgi

comparison src/parse.s @ 123:5681cdada362