Re: A89: HW1/2 Grayscale Speed


[Prev][Next][Index][Thread]

Re: A89: HW1/2 Grayscale Speed




On Fri, Aug 04, 2000 at 10:24:54PM -0400, Scott Noveck wrote:
>
> > 8,000 clocks -- let's say 10,000 just to be sure.
> 
> move.w (An)+,(An)+ is 12 cycles, putting your estimate off a lot.  If we
> assume that we're doing just move.w's 20 times each row, and then adding 10
> to each address register, it's gonna be slightly under 13,200 clocks - far
> over your estimate (and yes, I looked up the 16-bit timings this time =)
>
> With some fancy movem work, however, I can do your 160x100 in a mere
> less-than-11,200 clocks.  I'd like to see someone beat it, since it's the
> one of the keys to my HW2 grayscale working.  The following shall do,
> assuming a0 points to the source and a1 the destination:
> 
> movem.l (a0),d0-d4        ;12+8n= 52
> movem.l d0-d4,(a1)        ;8 +8n= 48
> adda.w #10,a0             ;        6
> adda.w #10,a1             ;      + 6
>                           ;----------
> ;[repeat 100 times]       ;      112 x 100 = 11,200 clocks
> ;Big thanks for Niklas for helping me shave off those 2,000 clocks

I assume that the screen is reprogrammed to 160x100. This also increases the
terrible HW2 frame rate.


doit	macro
	movem.l	(a6)+,d0-a5	; 14 regs = 56 bytes
	movem.l	d0-a5,\1
	endm

	; A6 points to new screen data
	doit	$4c00+56*0
	doit	$4c00+56*1
	doit	$4c00+56*2
	doit	$4c00+56*3
	doit	$4c00+56*4
	doit	$4c00+56*5
	; ...
	; ...
	; 36x2 MOVEM in total (36*(4+6)=360 bytes)
	; (the last one uses only 10 registers, 4*(35*14+10)=2000)
	; total time: 35*(12+8*14 + 12+8*14)+(12+8*10 + 12+8*10) = 8864 clocks

'Nuff said. We both now the pros and cons of unrolled loops and resizing the
screen to 160x100.


> So, for the sake of following Johan's example, let's stick it in auto-int 1
> and let it run once every 4 calls, as does the code in my original HW2
> grayscale routine.  Then we get approximately 90Hz, which causes a bit of a
> line to run through the screen up and down - I'd assume that means the
> screen is slightly faster, so 100Hz may be a good estimate.

AutoInt 1 runs at ~250 Hz on HW2, so AutoInt1 /4 would be ~63Hz.
(This is a guess -- the only way I can know is for you to write a program
that counts the number of interrupts during a known time interval, like
until you press a key.)

This "a bit of a line running through the screen up and down" -- do you mean
it's running UP ONLY or is it running DOWN ONLY or is it jumping BOTH UP AND
DOWN?


> >   As "normal" grayscale is 0-1-1 (or whatever, you get the idea), only 2/3
> > of the transfers need to occur since it would be stupid to overwrite plane
> > 1 with itself... Wasted CPU time is 1/12*2/3 = 6%.
> 
> Wouldn't not overwriting plane 1 cause severe smoothness problems?

You might have a point there. I saw a chance to cut off 1/3 of the cycles...


> It actually wouldn't help the smoothness too much, IMHO, because different
> "pieces" of LCD_MEM will be less recently updated than others.  I suppose
> that if the planes are constant - just an image - this would hold true,
> however.

Animation would need double buffering, which is essential if you have
sprites over a background. You don't have enough time to erase the sprites,
scroll the background (or simply repaint it) and then paste the sprites
again, if you don't use double buffering!


> That's one issue I've always questioned - are we saying that HW2 has a 20%
> increase in CPU speed, and 6% due to the lack of DMA?  It doesn't seem THAT
> much faster to me.

10MHz to 12Mhz is 20%, right? The HW1 DMA reads from RAM and disturbs the
processor. On HW2, it reads from its own memory.

Time for you to write another program, this time it only counts cycles (i.e.
hangs in a loop increasing a memory variable until a key is pressed). Run it
with interrupts disabled for, say, 10 seconds, then give me the source code
and the result on your calc. (Ints disabled = no kbhit()? Test for ON or
whatever!)


> Based on what TI told me, the DMA only slows down RAM access, which I
> suppose would be applicable here and, according to Paul, when doing long
> calculations, but not at all times.  Keep that in mind.

I've read what TI told you (at least the parts you posted to this list, if
you have more, I'd like to read it too!). Anyway, I can only guess what's
happening inside a HW2, but my conclusions are "logical" from what I do know
and from what people (like you!) tell me... When I say "this is the way it
is", I mean "from what I know, this is most likely the way it is".

I'm going to write a supposed-to flicker free HW2 grayscale program right
now. I'll send it to you when it's finished.


> There are multiple Linux distributions that run on PPC-based macs (G3, G4,
> and iMac, I believe) and that run on 68k-based macs.  I'm pretty sure that
> story was just that - a story - considering how apple actually helped people
> get Linux running on iBook hardware and hosted a PPC Linux website on their
> servers/domain. . .

We're not talking about PPC's, we're talking about the original small white
boxy things, the ones with a 68020 (030? -- nevermind, it's not important
for the story).
  Anyway, he (my friend) was writing a network driver. I guess he was
subscribed to some (or rather "the") m68k kernel developer's mailing list.
As I remember it, he read that the team had had REALLY strange problems with
the harddrive IRQ, and Apple refused to tell them how it worked. They had to
reverse engineer the operating system. End of story.


/Johan



References: