Re: A89: matrix


[Prev][Next][Index][Thread]

Re: A89: matrix




Olle Hedman writes:
 > you dont have much of a choice more than move.l (Ax)+,(Ay)+

Actually, you do. If you have lots of stuff to move then a little more 
cost on the preparation/cleanup side doesn't matter if you can save
heaps on the actual transfer. In that case this might help:

  movem.l d1-d7/a2-a6,-(sp)      ; Save all registers
loop:
  movem.l (a0)+,d1-d7/a2-a6      ; Suck in 48 bytes at once
  movem.l d1-d7/a2-a6,(a1)+      ; Store them at destination
  movem.l (a0)+,d1-d7/a2-a6      ; Suck next 48 bytes in
  movem.l d1-d7/a2-a6,(a1)+      ; Store at destination
  ...
  dbra    d1,loop                ; Do the loop if needed
  movem.l (sp)+,d1-d7/a2-a6      ; Restore registers

With the move.l method you waste 1 bus cycle for the insn fetch for 
every 4 bytes moved. With the movem.l method you waste 4 cycles for 
every 48 bytes moved, that is, your bus bandwith loss goes from 20% 
to only 7.7%. Of course if your block size is known a priori and it 
is small enough to warrant a loop unroll, then your d0 becomes free 
so you can move up to 52 bytes per 2 insns, which further decreases 
the bandwidth waste to 7.1%. If you need absolutely everything that
is possible, you can disable the interrupts, save a7 to some known
location and include a7 in the transfer too - your wasted bandwith
will reach the ever low 6.7%.

It's an old trick which was worth to do on a 68000. With the advent of 
the 68010 it went out of fashion for the 68010 and all further CPUs
have a loop mode (or equivalent) where data blocks can be moved
without insn fetches slowing down the transfer (i.e. your copy speed
is only limited by the actual transfer speed of the bus). On the old
68000, however, the above method was quite popular when you needed
that few extra bus cycles.

Regards,

Zoltan


Follow-Ups: References: