comparison src/parse.s @ 123:5681cdada362

Redo keyword table handling to handle keywords differing in length Some keywords differ only due to length. That is, the shorter keyword matches the leading characters of the longer one. Make the keyword table builder and processor handle these cases. Also re-implement the handler based on evolved understanding of its requirements.
author William Astle <lost@l-w.ca>
date Mon, 01 Jan 2024 15:15:45 -0700
parents 5d5472b11ccd
children 8770e6f977c3
comparison
equal deleted inserted replaced
122:5660ce96a9b7 123:5681cdada362
158 parse_nexttok15 tfr y,d ; fetch input location 158 parse_nexttok15 tfr y,d ; fetch input location
159 subd parse_tokenst ; calculate length of token 159 subd parse_tokenst ; calculate length of token
160 std val0+val.strlen ; save the length of the identifier 160 std val0+val.strlen ; save the length of the identifier
161 ldb #token_ident ; set token type to identifier (variable name, probably) 161 ldb #token_ident ; set token type to identifier (variable name, probably)
162 rts ; return token type, do not advance since we already did above 162 rts ; return token type, do not advance since we already did above
163 ; Parsing a potential keyword here. This works using a recursive lookup table. Each lookup table starts with a 18 bit 163 ; This routine parses tokens using the table at parse_wordtab. The table is structured as follows:
164 ; size entry for the table. Each entry is then 2 bytes. The first is the character to 164 ;
165 ; match for this entry. The second is either token_eot to indicate a sub table needs to be consulted, token_ident to 165 ; * two bytes which contain the length of the table less the two bytes for this length value
166 ; indicate that the token should be parsed as an identifier, or a token type code which indicates the value should 166 ; * a sequence of entries consisting of a single byte matching character and a token code followed
167 ; be accepted. If a sub table is to be consulted, the table will appear inline with the same format. Should matching 167 ; by an optional sub table, structured exactly the same way.
168 ; fall off the end of a table, the character being considered will be "ungot" and processing will return back up the 168 ;
169 ; call chain, ungetting characters, until the top level at which point token_ident will be returned. 169 ; The optional subtable will be present if the token code is token_eot
170 ; 170 ;
171 ; If the match character is negative, the match character represents the number of characters to "unget" and then 171 ; If the character match is negative, it means a lookahead failed. The negative value is the number
172 ; return the specified token. This is for handling look-aheads. 172 ; of characters to unget and the token code is the token value to return. No other entries after this
173 parse_nexttok16 pshs a,x ; save input character 173 ; in a table will be considered since thie negative match is a global match.
174 ldd ,x++ ; get number of entries in the table 174 ;
175 addd 1,s ; set pointer to end of table 175 ; When a token_eot match is found, if there are no further characters in the input, the match is
176 std 1,s 176 ; determined to be invalid and processing continues with the next entry.
177 parse_nexttok17 cmpa ,x++ ; does this entry match? 177 parse_wordtab0 leas 3,s ; clean up stack for sub table handling
178 beq parse_nexttok21 ; brif so 178 parse_wordtab pshs a,x ; save input character and start of table
179 ldb -2,x ; was this a look-ahead non-match? 179 ldd ,x++ ; get length of this table
180 bpl parse_nexttok19 ; brif not 180 addd 1,s ; calculate the address of the end of the table
181 leay b,y ; back up the input pointer 181 std 1,s ; save end address for comparison later
182 ldb -1,x ; get match token 182 lda ,s ; get back input character
183 parse_nexttok18 puls a,x,pc ; clean up stack and return the matched token 183 parse_wordtab1 ldb -1,x ; fetch token code for this entry
184 parse_nexttok19 ldb -1,x ; is there a sub table? 184 cmpa ,x++ ; does this entry match?
185 cmpb #token_eot 185 bne parse_wordtab4 ; brif not
186 bne parse_nexttok20 ; brif not 186 cmpb #token_eot ; is it indicating a sub table?
187 ldd ,x++ ; move past the sub table 187 bne parse_wordtab6 ; brif not
188 leax d,x 188 bsr parse_nextcharu ; fetch next input character (for sub table match)
189 parse_nexttok20 cmpx 1,s ; did we reach the end of this table? 189 bne parse_wordtab0 ; brif we are going to check the sub table
190 blo parse_nexttok17 ; brif not 190 parse_wordtab2 ldd ,x++ ; fetch length of sub table
191 ldb #token_ident ; flag identifier required 191 leax d,x ; move past sub table
192 puls a,x,pc ; restore input character, clean up stack, and return 192 parse_wordtab3 lda ,s ; get back input character
193 parse_nexttok21 ldb -1,x ; what token did we match? 193 cmpx 1,s ; are we at the end of the table?
194 cmpb #token_eot ; sub table? 194 blo parse_wordtab1 ; brif not - check another entry
195 bne parse_nexttok18 ; brif not - ding! ding! ding! we have a match 195 comb ; indicate no match
196 leas 3,s ; clean up stack 196 puls a,x,pc ; clean up stack and return
197 bsr parse_nextcharu ; fetch next input character 197 parse_wordtab4 lda -2,x ; get the match character
198 bne parse_nexttok16 ; process sub table entries if we have input 198 bmi parse_wordtab5 ; brif negative - lookahead fail
199 ldb #token_ident ; indicate we have an ident 199 cmpb #token_eot ; is there a sub table to skip?
200 leay -1,y ; unget the end of input 200 beq parse_wordtab2 ; brif so - skip sub table
201 rts 201 bra parse_wordtab3 ; otherwise just move to the next entry
202 parse_wordtab5 leay a,y ; move back the specified number of characters
203 parse_wordtab6 clra ; clear C to indicate a match
204 puls a,x,pc ; clean up stack and return
202 parse_number jmp parse_tokerr 205 parse_number jmp parse_tokerr
203 ; Relational token table, bits are > = < 206 ; Relational token table, bits are > = <
204 parse_reltab fcb token_error 207 parse_reltab fcb token_error
205 fcb token_lt 208 fcb token_lt
206 fcb token_eq 209 fcb token_eq
283 parse_tokdef token_pop,parse_noop 286 parse_tokdef token_pop,parse_noop
284 parse_tokdef token_to,parse_noop 287 parse_tokdef token_to,parse_noop
285 parse_tokdef token_and,parse_noop 288 parse_tokdef token_and,parse_noop
286 parse_tokdef token_or,parse_noop 289 parse_tokdef token_or,parse_noop
287 parse_tokdef token_go,parse_noop 290 parse_tokdef token_go,parse_noop
291 parse_tokdef token_as,parse_noop
292 parse_tokdef token_asc,parse_noop
288 parse_rem rts 293 parse_rem rts
289 294
290 *pragmapop list 295 *pragmapop list