comparison src/int.s @ 113:0aac26453849

Actually implement the 32 bit integer division algorithm correctly
author William Astle <lost@l-w.ca>
date Tue, 26 Dec 2023 23:37:45 -0700
parents 98b0646360e1
children
comparison
equal deleted inserted replaced
112:98b0646360e1 113:0aac26453849
382 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 382 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
383 ; Divide 32 bit integer in fpa0 significand by 32 bit integer in fpa1 significand, both treated as unsigned. Leave 383 ; Divide 32 bit integer in fpa0 significand by 32 bit integer in fpa1 significand, both treated as unsigned. Leave
384 ; quotient at fpaextra...fpaextra+3 and remainder at fpaextra+4...fpaextra+7; does not check for division by zero 384 ; quotient at fpaextra...fpaextra+3 and remainder at fpaextra+4...fpaextra+7; does not check for division by zero
385 ; which will result in a quotient of 0xffffffff and a remainder will be the dividend. It will not get suck in a loop. 385 ; which will result in a quotient of 0xffffffff and a remainder will be the dividend. It will not get suck in a loop.
386 ; 386 ;
387 ; Algorithm is basically pencil and paper long division. We check to see if the divisor "goes" at each step by doing 387 ; Uses a 96 bit temporary at fpaextra. The high 32 bits are the result and the mid 32 bits will be the remainder. The
388 ; a trial subtraction without saving the result. If it doesn't go, we just loop around again. If it does go, we stash 388 ; remaining 32 bits are used to avoid modifing the fpa0 significand. The extra 4 bytes could be avoided by simply
389 ; a 1 bit in the quotient and actually do the subtraction. Then go loop around again. Doing it this way rather than 389 ; clobbering the fpa0 significand.
390 ; with an actual subtraction and then undoing it with addition saves two store instructions on the comparison saves 390 ;
391 ; having to do a restore in the no-go case which is going to be quite common with values whose upper bits are 391 ; The algorithm is basically the pencil and paper long division algorithm. First, the dividend is shifted left one
392 ; mostly zeroes, thus it makes the operations faster in that case, for integers. (Floating point is a different 392 ; bit into the remainder. If a carry occurs from that, the subtraction will succeed. If not, do a trial subtraction
393 ; problem.) 393 ; which is just the subtraction without saving the result. If that results in a carry, then the divisor doesn't "go"
394 ; at this bit position so the quotient bit will be zero. If there is no carry here or there was a carry on the shift,
395 ; it does "go" so the quotient bit will be one. If it went, actually do the subtraction. In either event, shift the
396 ; quotient bit into the accumulated quotient. Do this for all 32 bits.
397 ;
398 ; The carry has to be checked on the shift to simulate doing what is basically a 33 bit subtraction.
394 util_div32 ldd fpa0+fpa.sig+2 ; copy dividend to result location 399 util_div32 ldd fpa0+fpa.sig+2 ; copy dividend to result location
395 std fpaextra+6 400 std fpaextra+10
396 ldd fpa0+fpa.sig 401 ldd fpa0+fpa.sig
397 std fpaextra+4 402 std fpaextra+8
398 ldb #32 ; do 32 bits 403 ldb #32 ; do 64 bits - will give us a remainder
399 stb fpa0+fpa.exp ; save counter somewhere because we don't have enough registers 404 stb fpa0+fpa.exp ; save counter somewhere because we don't have enough registers
400 ldd zero ; zero out remainder 405 ldd zero ; zero out remainder
406 std fpaextra
407 std fpaextra+2
401 std fpaextra+4 408 std fpaextra+4
402 std fpaextra+6 409 std fpaextra+6
403 util_div32a lsl fpaextra+3 ; shift dividend residue into remainder 410 util_div32a lsl fpaextra+11 ; shift residue and remainder over
404 rol fpaextra+2 411 rol fpaextra+10
405 rol fpaextra+1 412 rol fpaextra+9
406 rol fpaextra 413 rol fpaextra+8
407 rol fpaextra+7 414 rol fpaextra+7
408 rol fpaextra+6 415 rol fpaextra+6
409 rol fpaextra+5 416 rol fpaextra+5
410 rol fpaextra+4 417 rol fpaextra+4
411 ldd fpaextra+6 ; now subtract divisor from remainder 418 pshs cc ; save "goes" status from shift
419 bcs util_div32b ; brif it definitely goes
420 ldd fpaextra+6 ; do a trial subtraction
412 subd fpa1+fpa.sig+2 421 subd fpa1+fpa.sig+2
413 ldd fpaextra+4 422 ldd fpaextra+4
414 sbcb fpa1+fpa.sig+1 423 sbcb fpa1+fpa.sig+1
415 sbca fpa1+fpa.sig 424 sbca fpa1+fpa.sig
416 bcs util_div32b ; brif it doesn't go - don't subtract or set bit 425 bcs util_div32c ; brif it doesn't go - goes status on stack is right
417 inc fpaextra+3 ; set quotient bit 426 inc ,s ; set "C" on the stack
418 ldd fpaextra+6 ; actually do the subtraction 427 util_div32b ldd fpaextra+6 ; actually do the subtraction
419 subd fpa1+fpa.sig+2 428 subd fpa1+fpa.sig+2
420 std fpaextra+6 429 std fpaextra+6
421 ldd fpaextra+4 430 ldd fpaextra+4
422 sbcb fpa1+fpa.sig+1 431 sbcb fpa1+fpa.sig+1
423 sbca fpa1+fpa.sig 432 sbca fpa1+fpa.sig
424 std fpaextra+4 433 std fpaextra+4
425 util_div32b dec fpa0+fpa.exp ; done all 32 bits? 434 util_div32c puls cc ; get back "goes" status
426 bne util_div32a ; do another 435 rol fpaextra+3 ; shift quotient over and put the new bit in
436 rol fpaextra+2
437 rol fpaextra+1
438 rol fpaextra
439 dec fpa0+fpa.exp ; done 32 bits?
440 bne util_div32a ; brif not - handle another bit position
441 rts
427 *pragmapop list 442 *pragmapop list