Mercurial > hg > index.cgi
comparison src/parse.s @ 123:5681cdada362
Redo keyword table handling to handle keywords differing in length
Some keywords differ only due to length. That is, the shorter keyword
matches the leading characters of the longer one. Make the keyword table
builder and processor handle these cases. Also re-implement the handler
based on evolved understanding of its requirements.
author | William Astle <lost@l-w.ca> |
---|---|
date | Mon, 01 Jan 2024 15:15:45 -0700 |
parents | 5d5472b11ccd |
children | 8770e6f977c3 |
comparison
equal
deleted
inserted
replaced
122:5660ce96a9b7 | 123:5681cdada362 |
---|---|
158 parse_nexttok15 tfr y,d ; fetch input location | 158 parse_nexttok15 tfr y,d ; fetch input location |
159 subd parse_tokenst ; calculate length of token | 159 subd parse_tokenst ; calculate length of token |
160 std val0+val.strlen ; save the length of the identifier | 160 std val0+val.strlen ; save the length of the identifier |
161 ldb #token_ident ; set token type to identifier (variable name, probably) | 161 ldb #token_ident ; set token type to identifier (variable name, probably) |
162 rts ; return token type, do not advance since we already did above | 162 rts ; return token type, do not advance since we already did above |
163 ; Parsing a potential keyword here. This works using a recursive lookup table. Each lookup table starts with a 18 bit | 163 ; This routine parses tokens using the table at parse_wordtab. The table is structured as follows: |
164 ; size entry for the table. Each entry is then 2 bytes. The first is the character to | 164 ; |
165 ; match for this entry. The second is either token_eot to indicate a sub table needs to be consulted, token_ident to | 165 ; * two bytes which contain the length of the table less the two bytes for this length value |
166 ; indicate that the token should be parsed as an identifier, or a token type code which indicates the value should | 166 ; * a sequence of entries consisting of a single byte matching character and a token code followed |
167 ; be accepted. If a sub table is to be consulted, the table will appear inline with the same format. Should matching | 167 ; by an optional sub table, structured exactly the same way. |
168 ; fall off the end of a table, the character being considered will be "ungot" and processing will return back up the | 168 ; |
169 ; call chain, ungetting characters, until the top level at which point token_ident will be returned. | 169 ; The optional subtable will be present if the token code is token_eot |
170 ; | 170 ; |
171 ; If the match character is negative, the match character represents the number of characters to "unget" and then | 171 ; If the character match is negative, it means a lookahead failed. The negative value is the number |
172 ; return the specified token. This is for handling look-aheads. | 172 ; of characters to unget and the token code is the token value to return. No other entries after this |
173 parse_nexttok16 pshs a,x ; save input character | 173 ; in a table will be considered since thie negative match is a global match. |
174 ldd ,x++ ; get number of entries in the table | 174 ; |
175 addd 1,s ; set pointer to end of table | 175 ; When a token_eot match is found, if there are no further characters in the input, the match is |
176 std 1,s | 176 ; determined to be invalid and processing continues with the next entry. |
177 parse_nexttok17 cmpa ,x++ ; does this entry match? | 177 parse_wordtab0 leas 3,s ; clean up stack for sub table handling |
178 beq parse_nexttok21 ; brif so | 178 parse_wordtab pshs a,x ; save input character and start of table |
179 ldb -2,x ; was this a look-ahead non-match? | 179 ldd ,x++ ; get length of this table |
180 bpl parse_nexttok19 ; brif not | 180 addd 1,s ; calculate the address of the end of the table |
181 leay b,y ; back up the input pointer | 181 std 1,s ; save end address for comparison later |
182 ldb -1,x ; get match token | 182 lda ,s ; get back input character |
183 parse_nexttok18 puls a,x,pc ; clean up stack and return the matched token | 183 parse_wordtab1 ldb -1,x ; fetch token code for this entry |
184 parse_nexttok19 ldb -1,x ; is there a sub table? | 184 cmpa ,x++ ; does this entry match? |
185 cmpb #token_eot | 185 bne parse_wordtab4 ; brif not |
186 bne parse_nexttok20 ; brif not | 186 cmpb #token_eot ; is it indicating a sub table? |
187 ldd ,x++ ; move past the sub table | 187 bne parse_wordtab6 ; brif not |
188 leax d,x | 188 bsr parse_nextcharu ; fetch next input character (for sub table match) |
189 parse_nexttok20 cmpx 1,s ; did we reach the end of this table? | 189 bne parse_wordtab0 ; brif we are going to check the sub table |
190 blo parse_nexttok17 ; brif not | 190 parse_wordtab2 ldd ,x++ ; fetch length of sub table |
191 ldb #token_ident ; flag identifier required | 191 leax d,x ; move past sub table |
192 puls a,x,pc ; restore input character, clean up stack, and return | 192 parse_wordtab3 lda ,s ; get back input character |
193 parse_nexttok21 ldb -1,x ; what token did we match? | 193 cmpx 1,s ; are we at the end of the table? |
194 cmpb #token_eot ; sub table? | 194 blo parse_wordtab1 ; brif not - check another entry |
195 bne parse_nexttok18 ; brif not - ding! ding! ding! we have a match | 195 comb ; indicate no match |
196 leas 3,s ; clean up stack | 196 puls a,x,pc ; clean up stack and return |
197 bsr parse_nextcharu ; fetch next input character | 197 parse_wordtab4 lda -2,x ; get the match character |
198 bne parse_nexttok16 ; process sub table entries if we have input | 198 bmi parse_wordtab5 ; brif negative - lookahead fail |
199 ldb #token_ident ; indicate we have an ident | 199 cmpb #token_eot ; is there a sub table to skip? |
200 leay -1,y ; unget the end of input | 200 beq parse_wordtab2 ; brif so - skip sub table |
201 rts | 201 bra parse_wordtab3 ; otherwise just move to the next entry |
202 parse_wordtab5 leay a,y ; move back the specified number of characters | |
203 parse_wordtab6 clra ; clear C to indicate a match | |
204 puls a,x,pc ; clean up stack and return | |
202 parse_number jmp parse_tokerr | 205 parse_number jmp parse_tokerr |
203 ; Relational token table, bits are > = < | 206 ; Relational token table, bits are > = < |
204 parse_reltab fcb token_error | 207 parse_reltab fcb token_error |
205 fcb token_lt | 208 fcb token_lt |
206 fcb token_eq | 209 fcb token_eq |
283 parse_tokdef token_pop,parse_noop | 286 parse_tokdef token_pop,parse_noop |
284 parse_tokdef token_to,parse_noop | 287 parse_tokdef token_to,parse_noop |
285 parse_tokdef token_and,parse_noop | 288 parse_tokdef token_and,parse_noop |
286 parse_tokdef token_or,parse_noop | 289 parse_tokdef token_or,parse_noop |
287 parse_tokdef token_go,parse_noop | 290 parse_tokdef token_go,parse_noop |
291 parse_tokdef token_as,parse_noop | |
292 parse_tokdef token_asc,parse_noop | |
288 parse_rem rts | 293 parse_rem rts |
289 | 294 |
290 *pragmapop list | 295 *pragmapop list |