A83: Re: Re: Joe Wingerbermuhle?


[Prev][Next][Index][Thread]

A83: Re: Re: Joe Wingerbermuhle?




True, but I was writing the code to be small (so what else is new hehe).

Joe Wingbermuehle
http://www.usmo.com/~joewing/

-----Original Message-----
From: David Phillips <electrum@tfs.net>
To: assembly-83@lists.ticalc.org <assembly-83@lists.ticalc.org>; Dave
<scheltem@aaps.k12.mi.us>
Date: Thursday, November 05, 1998 6:18 PM
Subject: A83: Re: Joe Wingerbermuhle?


>
>That is a handy routine.  One thing I noticed is that you could save 3
>t-states per loop iteration if you loadeded $ff (-1) into D or E before the
>loop and load it from there into C instead of loading an immediate
(register
>to register = 4t, immediate to register = 7t):
>
>getString:
> or a
> ret z
> ld b,a
> ld d,$ff
> xor a
>getStringL1:
> push bc
> ld c,d            ; reload preloaded byte counter
> cpir
> pop bc
> djnz getStringL1
> ret
>
>Actually, now that I look at it again, the value of C never changes because
>it is saved along with the loop counter.  So just load it before the loop:
>
>getString:
> or a
> ret z
> ld b,a
> ld c,$ff            ; preload to save time, it never changes
> xor a
>getStringL1:
> push bc             ; save loop counter AND byte counter
> cpir
> pop bc              ; restores loop counter AND byte counter
> djnz getStringL1
> ret
>
>Hmm, I just noticed one more thing that would make it even faster, if you
>don't mind trashing DE.  Pushes/pops are pretty slow compared to 8-bit
>register instructions (since the Z80 IS 8-bit, not 16-bit).  So why not
just
>store the values in DE instead of pushing/popping it:
>
>getString:
> or a
> ret z
> ld b,a
> ld e,$ff            ; preload to save time
> xor a
>getStringL1:
> ld d,b              ; save the loop counter
> ld c,e              ; reload the byte counter
> cpir
> ld b,d              ; restore the loop counter
> djnz getStringL1
> ret
>
>Ah, one more thing, last one, I promise (I know you're all getting sick of
>me by now, this is beginning to look like an article on optimizing...).
Why
>do all that register shuffling just to use DJNZ?  The reason that
>instruction is normally faster than looping yourself is because it handles
>the decrementing, comparing and jumping all in one instruction.  But the
>downside is that the only register you can use is B.  B is already used for
>the CPIR, making it faster to do it yourself with a different register:
>
>getString:
> or a                ; is it 0?
> ret z               ; then we're already pointing to it
> ld d,a              ; D is the loop counter now
> ld e,$ff            ; preload to save time
> xor a               ; clear A, we're checking for 0
>getStringL1:
> ld c,e              ; reload the byte counter
> cpir                ; find the zero byte at the end of the string
> dec d               ; decrement loop counter
> jr z,getStringL1    ; loop if not 0 (use JP to add 1 byte, save 2 t's)
> ret                 ; the end!
>
>Ok, I think I'm done now, that routine is about as fast as it can get.  The
>only possible change (that I see!) is to swap the JR with a JP, because an
>absolute jump is 2 t-states faster, though it takes an extra byte for the
>16-bit address.  Just for kicks, let's see how the routines compare to each
>other (note that the first time is per iteration, the second is the startup
>time, and the byte size includes the RET):
>                                      Bytes:  T-States:
>=======================================================
>Original routine:                    |  13   | 62 (23)
>Preloaded byte counter:              |  14   | 59 (30)
>Preloaded byte counter w/o reload:   |  13   | 55 (30)
>Saved counters w/ registers:         |  14   | 46 (30)
>Final routine:                       |  13   | 41 (30)
>
>I hope got those times/bytes right.  Anyways, this proves that you can
>optimize almost any routine, no matter how small or optimized it looks.  We
>saved 21 t-states per loop iteration.  That may seem like alot, but if you
>had 100 strings, that would be 2100 t-states!  Note that the extra
>preloading added 7 t-states to the startup time for the loop, but time
would
>still be saved even it only looped once.  And the routine is the same size.
>
>TANSTATFC!
>
>--
>David Phillips <electrum@tfs.net>
>ICQ: 13811951
>AOL/AIM: Electrum32
>86 Central: http://www.tfs.net/~electrum/
>"There ain't no such thing as the fastest code!" -- Michael Abrash
>
>-----Original Message-----
>From: Henry Davidowich <rdaneelolivaw@hotmail.com>
>To: assembly-83@lists.ticalc.org <assembly-83@lists.ticalc.org>
>Date: Thursday, November 05, 1998 5:19 PM
>Subject: A83: Joe Wingerbermuhle?
>
>
>>
>>Hey Joe, do you mind if I borrow this code?  I found it in Ahmed
>>El-Helw's Periodic Table 2.0 (great program!).
>>
>>;---------= Point hl to string a =---------
>>; by: Joe Wingerbermuhle
>>; Thanks, this is a lot easier than my
>>; method of multiplying string # * 12
>>;
>>; Input: a=string number (0 to 255)
>>; hl->string data
>>; Output: hl->string
>>
>>getString:
>> or a
>> ret z
>> ld b,a
>> xor a
>>getStringL1:
>> push bc
>> ld c,-1
>> cpir
>> pop bc
>> djnz getStringL1
>> ret
>>
>>______________________________________________________
>>Get Your Private, Free Email at http://www.hotmail.com
>