A89: Re: HW1/2 Grayscale Speed (was Re: Grayscale Troubles)


[Prev][Next][Index][Thread]

A89: Re: HW1/2 Grayscale Speed (was Re: Grayscale Troubles)




> The screen is 160x100 pixels = 2000 bytes = 1000 words. Each word takes 4
> clocks to read and 4 clocks to write, i.e. copying a whole screen would
take
> 8,000 clocks -- let's say 10,000 just to be sure.

move.w (An)+,(An)+ is 12 cycles, putting your estimate off a lot.  If we
assume that we're doing just move.w's 20 times each row, and then adding 10
to each address register, it's gonna be slightly under 13,200 clocks - far
over your estimate (and yes, I looked up the 16-bit timings this time =)

With some fancy movem work, however, I can do your 160x100 in a mere
less-than-11,200 clocks.  I'd like to see someone beat it, since it's the
one of the keys to my HW2 grayscale working.  The following shall do,
assuming a0 points to the source and a1 the destination:

movem.l (a0),d0-d4        ;12+8n= 52
movem.l d0-d4,(a1)        ;8 +8n= 48
adda.w #10,a0             ;        6
adda.w #10,a1             ;      + 6
                          ;----------
;[repeat 100 times]       ;      112 x 100 = 11,200 clocks
;Big thanks for Niklas for helping me shave off those 2,000 clocks


> Now, assume that the
> refresh rate is 100Hz, giving a total of 1,000,000 clocks per second. The
> CPU speed is 12MHz, or 12,000,000 cycles per second, so this copying would
> take 1/12, or about 8%, of the CPU time.

If we put above code into auto-int 5, we can adjust it to an ideal
as-close-to-the-screen-refresh-rate-as-we-can.  The problem is, WE DON'T
KNOW THE SCREEN REFRESH RATE!

So, for the sake of following Johan's example, let's stick it in auto-int 1
and let it run once every 4 calls, as does the code in my original HW2
grayscale routine.  Then we get approximately 90Hz, which causes a bit of a
line to run through the screen up and down - I'd assume that means the
screen is slightly faster, so 100Hz may be a good estimate.

Anyways, fixing the calculations, we get 1,008,000 clockes per second, or
11.9% of the CPU time.  Way above that earlier estimate - oh, and did I
mention that this one looks much uglier?

>   As "normal" grayscale is 0-1-1 (or whatever, you get the idea), only 2/3
> of the transfers need to occur since it would be stupid to overwrite plane
1
> with itself... Wasted CPU time is 1/12*2/3 = 6%.

Wouldn't not overwriting plane 1 cause severe smoothness problems?

>   To make things run more smoothly, each memory transfer can be split up
> into smaller pieces, as long as it remain synchronized with the updating
of
> the screen (easy if HW2 works as I believe it does, but must reprogram
> AutoInt 5).

It actually wouldn't help the smoothness too much, IMHO, because different
"pieces" of LCD_MEM will be less recently updated than others.  I suppose
that if the planes are constant - just an image - this would hold true,
however.

> On a HW1, the LCD controller DMA is reading the pixel data from RAM, one
> byte at a time. Again, assume a 160x100 screen (it must've been
reprogrammed
> for this, otherwise it will read all 240x128 pixels!!!). Each byte need
only
> to be read, not written back. Anyway, it's a total of 800,000 clocks per
> second assuming 100Hz (160Hz is possible, but let's stick to the same
> figures as above). Luckily, the CPU doesn't need to know about this going
on
> in the background, except that the DMA steals memory cycles from it! [In
my
> test program] the CPU seems to use ~70% of the bus bandwidth; the 8% that
> the LCD uses turns into 6% lost for the CPU.

That's one issue I've always questioned - are we saying that HW2 has a 20%
increase in CPU speed, and 6% due to the lack of DMA?  It doesn't seem THAT
much faster to me.

Based on what TI told me, the DMA only slows down RAM access, which I
suppose would be applicable here and, according to Paul, when doing long
calculations, but not at all times.  Keep that in mind.


> 6% on both HW1 and HW2?

Definately not 6%; I'll let you check my comments before we consider
calculating a new figure.

> CONCLUSION: Grayscale "in software" on HW2 is just (or not far from) as
fast
> as grayscale "in hardware" on HW1. (And don't forget that when the HW2 CPU
is
> doing something "useful", it's doing it 20% faster than the HW1 CPU.)

Don't forget that HW2 Grayscale is MUCH, MUCH, MUCH UGLIER!!!  You have to
own a HW2 calc to appreciate that.

> (Right? Wrong?)

I'll let you decide =P

> A friend told me (more like a joke) that the Linux team that tried/s to
> target (I think it was) the Apple Macintosh, have had (or still have) a
hard
> time getting information about its hardware. From what they do know about
> it, they guess that Apple refuses to release any information because
they're
> so ASHAMED of its INCREDIBLY poor design...

There are multiple Linux distributions that run on PPC-based macs (G3, G4,
and iMac, I believe) and that run on 68k-based macs.  I'm pretty sure that
story was just that - a story - considering how apple actually helped people
get Linux running on iBook hardware and hosted a PPC Linux website on their
servers/domain. . .

    -Scott




Follow-Ups: References: