A86: Re: questions re random #s and assembling


[Prev][Next][Index][Thread]

A86: Re: questions re random #s and assembling





> My second question relates to how the assemblers work. I have recently
> become intrigued by the possibility of self-modifying programs. After
> seeing programs that wrote high scores and save states into their own
> code, I figured that it would be possible to create a program that
> changed its own code. Although I can't think of any practical
> applications for this outside of AI (which would be very difficult), I'm
> sure there are many. I figure that I can just get AssemblyStudio86 to
> give me hex listings for programs so I can determine the hex
> equivalencies of the various commands (and thus know what to write in to
> change). However, I don't understand how labels work in machine code.
> Could someone explain what an assembler does to a label when it's
> building the code? And, if there is a reference for hex equivalencies
> somewhere, could someone direct me to it?
>

Self modifying code is a very important and powerful tool in programming.
This can be used to change both instructions and data.  While changing
actual instructions into something different is not used very often,
changing the data bytes of instructions is used quite frequently.  For
example, say a routine has a value passed into it in the A register.
Instead of a push/pop for the register, which isn't always feasible due to
early returns and such, self modifying code can be used to save it.

; a = number passed in
Routine:
 ld (_@Value),a

 ...

_@Value =$+1
 ld a,0
 ...

Before going on, an explanation of labels is required.  Jeremy Goetsch
(author of Asm Studio) defines a label as follows:

"Label

A label is a symbol used by the assembler that is assigned a numeric value
and acts as a constant in expressions. The most common use of labels is to
name a location in a program, so that the location can be referenced by name
when using a jump of call.

Labels can be up to 31 characters long and can include letters, numbers, and
the underscore character (_). Labels are case sensitive (unless the "Labels
are case insensitive" option in the Assembly Options dialog box is set)."

Fortunately for us, on the 86 assembly code is run in a defined space.  All
programs start at $d748.  If you think about how this is laid out in memory,
a program might look like this:

[$d748]   sub a
[$d749]   ld hl,$1234
[$d74c]   call $4a33
[$d74f]   ret

 sub a         ; 1 byte
 ld hl,$1234   ; 3 bytes
 call _dispAHL ; 3 bytes
 ret           ; 1 byte

Say you need to jump to the beginning of the program.  You would need to
know where that is in memory once the program is assembled.  By using a
label, you can have the assembler calculate it for you.

Start:
 sub a
Cool:
 ld hl,$1234

If this is the start of the program, then (to the assembler), Start has the
value $d748 and Cool has the value $d749.  Labels are just like equates or
constants:

 JOE = 1
 num = $4a
 bitmask = %10101101

The assembler keeps track of all these for you.  It makes programming much
easier, because you can use names instead of numbers for everything.  And if
you add instructions in between labels, the assembler automatically
recalculates all the label and jump values for you.

Back to the example above with the self modifying code.  In constant
expressions, the $ symbol represents the value of the program counter.

Start:
Label =$
 sub a
Cool:
Joe =$

In these, both Start and Label share the value of $d748.  Cool and Joe have
the value of $d749.  Instruction opcodes (the actual bytes when compiled)
consist of multiple bytes.

 ld a,0

The general form of the instruction is

 ld a,NN     ; 2 bytes

The first byte of the instruction opcode is what tells the processor to load
a byte into the A register.  The next byte of the opcode is the data byte.
It contains the value 0, or whatever follows the <ld a>.  Say you want to
change the value of 0 to something else, like 5.  The first way to do it is
as follows:

 ld a,5
 ld (Label+1),a

...

Label:
 ld a,0

Because Label points to the beginning of the load instruction, changing the
value at Label would change the instruction opcode.  Not good!  Adding 1 to
Label will point to the data byte, which is what you want to change.
However, if you are changing the value in multiple places, it is easy to
forget to add 1 to it.  And, some instructions like <ld rr,NNNN> have 4
bytes, so you have to add 2.  By defining an equate (constant) to the
correct byte of the instruction, you can elimate those mistakes (assuming
you point to the correct byte in the equate :)  This is accomplished by
using the $ symbol:

Value =$+1
 ld a,0

Value now points to the data byte of the opcode.  Note that there is a space
between the label name and the equal sign.  Assembly Studio (and probably
TASM) thinks the equal sign is part of the label name if there's no space.
So don't forget it.  A couple more examples might help:

 ld a,0
Value =$-1             ; point backwards one byte


Value =$+2
 ld bc,$ffff           ; 4 byte instruction

A final thing is that using a special identifier can make self modifying
code stand out to both you and anyone reading your code.  I use the
underscore-ampersand combination, because it is unique and looks cool :)

_@Value =$+1
 ld a,0

And one more use for self modifying code is looping without using a register
for the loop counter.  It's almost as fast as a push/pop of a register
that's used for a loop counter.  A common way to do a loop:

 ld a,5                ; loop 5 times
Loop:
 ld (_@LoopCounter),a  ; update loop counter value

...

_@LoopCounter =$+1
 ld a,0                ; load the current value of the loop counter
 dec a                 ; decrease it
 jr nz,Loop            ; loop if not zero

This makes for a clean loop and doesn't tie up a register.  And because you
don't have to push anything onto the stack, you can have a return inside the
loop without poping off the loop counter.  When the loop starts, the initial
value is stored in A.  It is then stored in the load instruction.  At the
end of the loop, the loop counter value is loaded as a constant.  It is then
decremented, and if it isn't at 0, the loop continues.  Then, at the top of
the loop, the new value is saved.  Note that the same load is used for the
initial as is used for the body of the loop.  This saves space in the
program.

Used with care, self modifying code can be a VERY powerful and helpful tool!

> One last thing: I'm not sure if I'm using the right terms for these
> concepts. As I understand it,
> machine code=hex code=1s and 0s compressed into bytes
> compiling=converting from a high-level language to a low-level language
> (as in BASIC to Intel asm)
> assembling=building=converting from a low-level language into hex code
> If these terms are incorrect or mixed-up, please correct me for future
> reference.

Sounds like you have the right ideas.




Follow-Ups: References: