Mercurial > hg > index.cgi
comparison src/int.s @ 113:0aac26453849
Actually implement the 32 bit integer division algorithm correctly
author | William Astle <lost@l-w.ca> |
---|---|
date | Tue, 26 Dec 2023 23:37:45 -0700 |
parents | 98b0646360e1 |
children |
comparison
equal
deleted
inserted
replaced
112:98b0646360e1 | 113:0aac26453849 |
---|---|
382 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | 382 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
383 ; Divide 32 bit integer in fpa0 significand by 32 bit integer in fpa1 significand, both treated as unsigned. Leave | 383 ; Divide 32 bit integer in fpa0 significand by 32 bit integer in fpa1 significand, both treated as unsigned. Leave |
384 ; quotient at fpaextra...fpaextra+3 and remainder at fpaextra+4...fpaextra+7; does not check for division by zero | 384 ; quotient at fpaextra...fpaextra+3 and remainder at fpaextra+4...fpaextra+7; does not check for division by zero |
385 ; which will result in a quotient of 0xffffffff and a remainder will be the dividend. It will not get suck in a loop. | 385 ; which will result in a quotient of 0xffffffff and a remainder will be the dividend. It will not get suck in a loop. |
386 ; | 386 ; |
387 ; Algorithm is basically pencil and paper long division. We check to see if the divisor "goes" at each step by doing | 387 ; Uses a 96 bit temporary at fpaextra. The high 32 bits are the result and the mid 32 bits will be the remainder. The |
388 ; a trial subtraction without saving the result. If it doesn't go, we just loop around again. If it does go, we stash | 388 ; remaining 32 bits are used to avoid modifing the fpa0 significand. The extra 4 bytes could be avoided by simply |
389 ; a 1 bit in the quotient and actually do the subtraction. Then go loop around again. Doing it this way rather than | 389 ; clobbering the fpa0 significand. |
390 ; with an actual subtraction and then undoing it with addition saves two store instructions on the comparison saves | 390 ; |
391 ; having to do a restore in the no-go case which is going to be quite common with values whose upper bits are | 391 ; The algorithm is basically the pencil and paper long division algorithm. First, the dividend is shifted left one |
392 ; mostly zeroes, thus it makes the operations faster in that case, for integers. (Floating point is a different | 392 ; bit into the remainder. If a carry occurs from that, the subtraction will succeed. If not, do a trial subtraction |
393 ; problem.) | 393 ; which is just the subtraction without saving the result. If that results in a carry, then the divisor doesn't "go" |
394 ; at this bit position so the quotient bit will be zero. If there is no carry here or there was a carry on the shift, | |
395 ; it does "go" so the quotient bit will be one. If it went, actually do the subtraction. In either event, shift the | |
396 ; quotient bit into the accumulated quotient. Do this for all 32 bits. | |
397 ; | |
398 ; The carry has to be checked on the shift to simulate doing what is basically a 33 bit subtraction. | |
394 util_div32 ldd fpa0+fpa.sig+2 ; copy dividend to result location | 399 util_div32 ldd fpa0+fpa.sig+2 ; copy dividend to result location |
395 std fpaextra+6 | 400 std fpaextra+10 |
396 ldd fpa0+fpa.sig | 401 ldd fpa0+fpa.sig |
397 std fpaextra+4 | 402 std fpaextra+8 |
398 ldb #32 ; do 32 bits | 403 ldb #32 ; do 64 bits - will give us a remainder |
399 stb fpa0+fpa.exp ; save counter somewhere because we don't have enough registers | 404 stb fpa0+fpa.exp ; save counter somewhere because we don't have enough registers |
400 ldd zero ; zero out remainder | 405 ldd zero ; zero out remainder |
406 std fpaextra | |
407 std fpaextra+2 | |
401 std fpaextra+4 | 408 std fpaextra+4 |
402 std fpaextra+6 | 409 std fpaextra+6 |
403 util_div32a lsl fpaextra+3 ; shift dividend residue into remainder | 410 util_div32a lsl fpaextra+11 ; shift residue and remainder over |
404 rol fpaextra+2 | 411 rol fpaextra+10 |
405 rol fpaextra+1 | 412 rol fpaextra+9 |
406 rol fpaextra | 413 rol fpaextra+8 |
407 rol fpaextra+7 | 414 rol fpaextra+7 |
408 rol fpaextra+6 | 415 rol fpaextra+6 |
409 rol fpaextra+5 | 416 rol fpaextra+5 |
410 rol fpaextra+4 | 417 rol fpaextra+4 |
411 ldd fpaextra+6 ; now subtract divisor from remainder | 418 pshs cc ; save "goes" status from shift |
419 bcs util_div32b ; brif it definitely goes | |
420 ldd fpaextra+6 ; do a trial subtraction | |
412 subd fpa1+fpa.sig+2 | 421 subd fpa1+fpa.sig+2 |
413 ldd fpaextra+4 | 422 ldd fpaextra+4 |
414 sbcb fpa1+fpa.sig+1 | 423 sbcb fpa1+fpa.sig+1 |
415 sbca fpa1+fpa.sig | 424 sbca fpa1+fpa.sig |
416 bcs util_div32b ; brif it doesn't go - don't subtract or set bit | 425 bcs util_div32c ; brif it doesn't go - goes status on stack is right |
417 inc fpaextra+3 ; set quotient bit | 426 inc ,s ; set "C" on the stack |
418 ldd fpaextra+6 ; actually do the subtraction | 427 util_div32b ldd fpaextra+6 ; actually do the subtraction |
419 subd fpa1+fpa.sig+2 | 428 subd fpa1+fpa.sig+2 |
420 std fpaextra+6 | 429 std fpaextra+6 |
421 ldd fpaextra+4 | 430 ldd fpaextra+4 |
422 sbcb fpa1+fpa.sig+1 | 431 sbcb fpa1+fpa.sig+1 |
423 sbca fpa1+fpa.sig | 432 sbca fpa1+fpa.sig |
424 std fpaextra+4 | 433 std fpaextra+4 |
425 util_div32b dec fpa0+fpa.exp ; done all 32 bits? | 434 util_div32c puls cc ; get back "goes" status |
426 bne util_div32a ; do another | 435 rol fpaextra+3 ; shift quotient over and put the new bit in |
436 rol fpaextra+2 | |
437 rol fpaextra+1 | |
438 rol fpaextra | |
439 dec fpa0+fpa.exp ; done 32 bits? | |
440 bne util_div32a ; brif not - handle another bit position | |
441 rts | |
427 *pragmapop list | 442 *pragmapop list |