Re: A89: Lookup Tables

To: assembly-89@lists.ticalc.org
Subject: Re: A89: Lookup Tables
From: Don Barnes <don.barnes@wmich.edu>
Date: Sat, 06 Feb 1999 18:34:49 -0500
Delivered-To: assembly-89-outgoing@towerguard.unix.edu.sollentuna.se
Delivered-To: assembly-89@lists.ticalc.org
References: <5b7b9342.36bc7a96@aol.com>
Reply-To: assembly-89@lists.ticalc.org


> How would one go about doing lookup tables in 68k?

There are a number of ways.  You could simply store the labels in a
table, sort of 
how you did it in your example.

So this:

> W1: .db "AOL Disk",0
> W2: .db "Pocket Protector",0
 .
 .
 .
> W16: .db "Giant Screwdriver",0
> W17: .db "Soldering Iron",0
> 
> WeaponTable: .dw W1,W2,W3,W4,W5,W6,W7,W8,W9,W10,W11,W12,W13,W14,W15,W16,W17

Would become:

W1 dc.b "AOL Disk",0
W2 dc.b "Pocket Protector",0
 .
 .
 .
W16 dc.b "Giant Screwdriver",0
W17 dc.b "Soldering Iron",0
 
WeaponTable 
  dc.l W1,W2,W3,...,W16,W17

And you would reference the entries like this:

 ; assume d0 contains the index of the string you wish to access
 ;    e.g. 0,1,2,...,16
 lsl.w #2,d0
 move.l	WeaponTable(pc,d0.w),a0
 ; now a0 contains the address of the start of the string

Depending on the size of your data, you could cut the size of you table 
in half by storing word-size offset values instead of the labels, which 
take up 4 bytes each.  I've attached a document I found on ticalc.org. 
It's about jump tables, a similar topic.

-Don

-- 
 Don Barnes
   E-MAIL: don.barnes@wmich.edu
       IM: DonRBarnes
    EFNET: donman
      WEB: http://mf.calc.org

Jumptables in 68k

Hello, I found this text in AmigaGuide format in my asm directory on my
harddisk, and it should be usefull in fargo to, so I thought I share it
with the rest of the fargo people. I have not tested any of it in a68k, but
I thought it would be better to send I up now, when I still remember.

BTW, I don't know who wrote it in the first place.

/ Oscar Lindberg
  f96osli@dd.chalmers.se

---------

     Many times, when I first started assembly language coding, I would
pour over a piece of code that I just needed to squeeze a couple more clock
cycles out of.  The code that, when a raster check is done, only over runs
the frame by one raster.  I still do that, but the code that I find myself
looking at in that way now, would have taken at least two whole frames
back then.  A large part of this is a direct result of a simple observation
that I made one day.  The ideal piece of code is one that, with very few
exceptions, makes no branches.  Why?  Well, not only are branches slow
instructions, but on machines with a cache, in many cases, it will flush
the cache.

     What did I do about this?  Well, the first, and foremost, step is to
really look at the code.  The end result of the code really has to be
analyzed.  Take for example the following PSet() routine.

* a0: byte in bit plane to be set.
* d0: bit in byte...
* d1: offset to next plane
* d2: color
PSet:		btst	#0,d2		; is this bit to be set?
		beq.b	.1		; nope
		bset.b	d0,(a0)		; set the bit
.1:		adda.l	d1,a0		; next plane
		btst	#1,d2		; is this bit to be set?
		beq.b	.2		; nope
		bset.b	d0,(a0)		; set the bit
.2:		adda.l	d1,a0		; next plane
		btst	#2,d2		; is this bit to be set?
		beq.b	.3		; nope
		bset.b	d0,(a0)		; set the bit
.3:		rts

This is by far the worst case.  This routine would take 92 clock cycles
in the best case!  It could be improved by replacing the btst's with lsr's
and the beq's with bcc's, but the code is still slow.  The killer:  the
branches.

     This introduces the one branch method.  It is essentially a 
construct that everyone has used in HLL's forever:  the case statement.
If one wanted to write the above code in C, they wouldn't use three if's,
would they?  Let's replace the above code with a simple table lookup and
see what kind of speed increase it gives.

* a0: byte in bit plane to be set.
* d0: bit in byte...
* d1: offset to next plane
* d2: color
PSet:		add.w	d2,d2			; word offset
		add.w	d2,d2			; long offset
		move.l	Table(pc,d2.w),a1	; addr. of correct function
		jmp	(a1)			; call it!

Table:		dc.l	.0,.1,.2,.3

.3:		bset.b	d0,(a0,d1)		; bpl1
		adda.l	d1,a0			; a0->bpl1
.2:		bset.b	d0,(a0,d1)		; bpl1 or bpl2
.1:		bset.b	d0,(a0)			; bpl0
.0:		rts

While this version accomplishes the samething, it does it in much, much
less time.  This time it only takes 50 clock cycles, 54% of before.  This
time the killer isn't the branches, it's memory references.  The (d8,PC,Xn)
addressing mode takes 14 cycles on long data, but only 10 on word or byte
data.  We can improve this timing and get rid of the add's by making the
table a byte offset.  This will cause a penalty on the jmp, but it will
be well justified.

* a0: byte in bit plane to be set.
* d0: bit in byte...
* d1: offset to next plane
* d2: color
PSet:		move.b	Table(pc,d2.w),d2	; offset of function
		jmp	Table(pc,d2)		; call it!

Table:		dc.b	.0-Table,.1-Table,.2-Table,.3-Table
		even

.3:		bset.b	d0,(a0)			; bpl0
		adda.l	d1,a0			; a0->bpl1
.2:		bset.b	d0,(a0,d1)		; bpl1 or bpl2
.1:		bset.b	d0,(a0)			; bpl0 or bpl1
.0:		rts

This routine is now reduced to a mere 44 clock cycles.  Saving only six
cycles my not seem like much, but in a time critical part of code it can
make all the difference in the world.  This still isn't the fastest that
this routine could be, but the optimizations that remain are beyond the
scope of this article.  The bottom line is, the shortest distance between
two points is a straight line!

References:

A89: Lookup Tables

From: ZeromusMog@aol.com

Prev: Re: A89: Simple Grid-based Putsprite Routine
Next: A89: where to get assembler
Index(es):
- Main
- Thread