From voyageur@sky.net Thu Dec 25 12:32:08 1997 Date: Wed, 24 Dec 1997 23:51:59 -0600 From: voyageur@sky.net To: Jason Stratos Papadopoulos Subject: Re: TinyPI Contents: * UUencoded version of new 113-byte TinyPI, 9304 decimal digits. * UUencoded version of new 112-byte TinyPI program which outputs 8188 digits, beginning with 0003243f. Note that last digit is wrong. * Modified version of your C code as explained below * ASM source-code for 113-byte decimal TinyPI. Should compile with TASM or MASM or A86. > Okay, here's a fixed version that produces correct output. There are > two loops now, because if you stayed with one you'd still have to > somehow deal with a quotient bigger than 16 bits. This way everything > will fit into longs, though it's more work. I had to make a slight change to the type-declarations to keep from having the output be all 0's. The modified code is below. I changed some of the variable names to agree with the names you used the first time. Your code inserted a 0001 before the expected 243f etc. I figured out that by moving your "array[0]+=2" to the end of the loop, the printout begins with 0003, which is the normal PI value, with 3 extra 0's. This value can be eliminated by starting the printout with array[1] instead of array[0]. > Truth be told, I doubt that translating this could make a smaller > assembly program than tinypi. Rather, because this works directly in > hex the resulting assembly code would be smaller than tinypi > (probably much less shuffling around of stuff in registers), but > converting tinypi for binary output would be smaller still. Too bad! Well, somehow I ended up getting this new version to be 112 bytes, which is even smaller than the optimized 113 byte decimal version. However, this version probably needs a visual progress indicator, as it appears to do nothing for close to a minute, before output of all digits. Because the denominator is 32*size+1, the largest value for size in a word-length array is 2047, yielding 8188 hex digits. To get more digits than that, I had to switch to a 32-bit array and denominator, which resulted in a program which was a couple dozen bytes larger. Also, it appears that no matter what I do, one or both of the final 2 array[] values in the printout is wrong. That means it would be a good idea to set size 1 or 2 higher than the number of digits needed in the printout. There may be an inherent error at the end of this method, which makes the other method a little more attractive. However, this version uses less memory, meaning the 32-bit version can generate more hex digits within the 64KB segment. I'll try posting a question to sci.math about the original C version of the decimal output. Hopefully, someone may know how that method can be modified to output hex digits. The hex version seems to be slightly faster than the decimal version, even with 2 loops. The 9304-decimal version takes me 31 seconds when output is redirected to a file, but the enclosed 8188-hex version took only 21 seconds. The C version of the hex program is much slower than the ASM version when trying to show many digits. However, if the decimal version could be adapted for hex output, I believe it could be even faster and smaller than this hex version. Thanks for all your help. -------------------CUT HERE INCLUDING THIS LINE----------------------- 113-bytes, outputs 9304 decimal digits of PI begin 644 tinypi.com MO1`GN31_B]F_<@&XT`?SJU,SP)E3B_+WXY92]^-:`\*7B\4#V_>G<`$#QA/7 M2S/_EY+W\Y>+\C/2]_.6DO?SB9=Q`3/2`\83UUM+=+V65N#ZPYUH,-4 ` end -------------------CUT HERE INCLUDING THIS LINE----------------------- 112-bytes, outputs 8188 hexadecimal digits of PI, beginning with 0003243f. Note that the very last output digit is wrong. The final value fed into BP can be reduced by 1, which eliminates that bad digit, along with 3 good ones. begin 644 pi8188hx.com MOP`"N?\'*\#SJ[_P?[WA_[X``BO;4P/;B]&+`/?UB0"+REM#@?O_!W+K2U,# MVXO']R`#P8/2`(D`B\I;"]MUZH,``H/M`D]URKW_![,0K;$$*]+W\U+B^;$$ 66H#Z"78#@,(G@,(PM`+-(>+N377?PTE4 ` end -------------------CUT HERE INCLUDING THIS LINE----------------------- #include int main(void) { unsigned short size=200, array[2046]={0}; unsigned long int denom, remainder, terms; int i; unsigned long int dividend, quotient; terms = size*16; denom = 2*terms+1; /* these can be precalculated. */ while(terms>0) { remainder = 0; for( i=0; i=0; i-- ) { dividend = terms * array[i] + remainder; array[i] = dividend & 0x0000ffff; remainder = dividend >> 16; /* note that this is much easier in assembly; in particular, no & or >> is needed since the 32-bit product is automatically partitioned in dx:ax */ } array[0] += 2; denom-=2; terms--; } for (i=0; i dx:ax xchg si,ax ; get the original dx to multiply it push dx mul bx pop dx add ax,dx xchg di,ax ; dx:ax * bx -> di:si mov ax,bp ; ax = 10,000 add bx,bx ; align for 2-byte array elements mul word ptr [bx+offset Buffer-2] ; because bx is decremented at the end, ; must subract 2 to point at correct item add ax,si adc dx,di ; dx:ax = dx:ax + di:si dec bx ; bx = 2*j -1 xor di,di xchg di,ax xchg dx,ax div bx ; dx = 0, ax = high part of prev di:si ; dx:ax / bx -> ax, dx=remainder xchg di,ax mov si,dx xor dx,dx div bx ; dx = 0, ax = low part of prev di:si xchg si,ax xchg dx,ax div bx mov [bx+offset buffer-1],dx ; because bx was decremented ; above, just subtract 1 to ; access correct element. xor dx,dx add ax,si ; si:di + ax -> dx:ax adc dx,di pop bx dec bx jnz InnerLoop div bp ; div by 10,000 - resulting mod has 4 decimal digits push dx ; save remainder for use in next loop. add ax,cx mov cx,4 mov bl,10 ; dividing by 10 yields decimal digits. Num1: cwd ; "xor dx,dx" not needed, value is always < 7fff div bx ; divide by 10 push dx loop Num1 mov cl,4 Num2: ; program can be made smaller by changing ; the following 4 instructions to: pop dx ; pop ax add dl,"0" ; add al,"0" mov ah,2 int 21H ; int 29H ; this saves 3 bytes, but the output ; can't be redirected to a file loop Num2 pop cx ; restore remainder pushed in DX above pop bx ; restore outer-loop's value sub bx,+14 ; decrement by 14. jnz OuterLoop ret ; .COM always has 0000 on the stack. ; instruction at offset 0000 is always ; an INT 20H, which ends the program. ProgramEnd: Buffer=$+1 ; align even without adding a byte end start ; This is offset 0172. Next 32564 words through FFD9h ; are used as workspace. =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Mark Andreas http://www.sky.net/~voyageur PGP key available via finger or from webpage =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ From jasonp@Glue.umd.edu Thu Dec 25 12:32:22 1997 Date: Thu, 25 Dec 1997 12:21:30 -0500 (EST) From: Jason Stratos Papadopoulos To: voyageur@sky.net Cc: jasonp@isr.umd.edu Subject: Re: TinyPI Wow, this is definitely going to be included on the web page! If you're concerned about the last digit being wrong, it may be a good idea to have array[0] = (units digit) + 1st three hex digits. So you add 2000 to array[0] instead of 2, giving you three more digits in the final tally. Just a thought... I'll try to piece together how the original really works; if it can be adapted to produce hex it'll be blazing fast, *really* small, and output partial results. What's surprising to me is how fast the modified version really is (your 112 byte gem); One would think that it would take twice as long as the 125 byte decimal version because the array worked on does not shrink as the computation progresses. Instead it's 30% faster! The only difference I can think of is that you save one multiply (by 10000) and some register movement in all the inner loops. At this rate, doing the same to the original version means the computation would finish in the blink of an eye! I'll keep working on it, and will let you know if I come up with anything interesting. Thanks again! jasonp