Notes about chapter 4
PTR
"PTR" is actually not understood by NASM.
MASM (another assembler) needs it in places because MASM allows
code like
mov eax, val ; A <- val
mov eax, [val] ; A <- val
which apparently mean the same thing to it.
Under NASM, these have distinct meanings.
mov eax, val ; A <- address of val
mov eax, [val] ; A <- value of val
For example, imagine that "val" means the memory location 1234h and
stores the value 56h. The address is 1234h, and the value is 56h.
MASM needs a way to distinguish "val" and "[val]", and "PTR" does this.
See the
ptr_example.txt file.
It shows an assembly language program for NASM similar to the
chapter 4 slide 44.
Since the example uses the value 12345678h, which takes 4 bytes,
and because Intel uses a "Little Endian" ordering, this is stored
as 78, 56, 34, 12 (all hexedecimal) in memory.
Accessing this value as a byte sequence verifies this.
In class, I mentioned that we could also access the bytes individually in C.
The program "ptr.c" shows how to do this.
It uses a "union" to store myint and mychar (a character array) in the
same place in memory.
You don't need to know what a union is for our class, since it's covered
in the csc3320 class.
What it does for us is that it allows us to verify this concept using
a HLL.
OFFSET
"OFFSET" is another key word needed by MASM.
In the slides for the Irvine book,
we see examples like this (chapter 4, slide 42).
mov esi, OFFSET bVal
mov esi, OFFSET wVal
These return the addresses of these variables, so if the data section
starts at 00404000h (as given in the example), and bVal is the first thing
in the data section, then bVal will have the address 00404000h.
Other variables will have subsequent addresses, based on their order and
the sizes of the variables before it.
For example, dVal has the address 00404003h, and takes up 32 bits
(4 bytes). Therefore, the next thing defined, dVal2, has the address
00404003h + 4h = 00404007h.
In NASM, we do not need to specify "OFFSET". See the program offset.asm
(under the link called
offset.txt).
Notice that "dVal" starts at 60103bh. Don't read too much into these addresses;
the point is that the data section starts at a memory address, and the
data values are based on that.
TYPE
Chapter 4's slide 49 shows examples of TYPE, which is not supported in
NASM.
Examples like these
mov eax, TYPE var1
mov eax, TYPE var2
mean that the size of each variable, in bytes, is stored in the A register.
Under NASM, we can achieve the same effect with code like this.
wVal: dw 0
wVal_size equ ($ - wVal)
...
mov eax, wVal_size
See the file
type.txt for an example.
Is there another way to do this in NASM?
The document
https://www.nasm.us/doc/
holds the answer (e.g. see section 2.2.3).
This is a good resource to bookmark.
Indexed Operands
Slide 59 (chapter 4) shows an example of making a sum of an array.
Here (
array_sum_pointer.txt)
is an example for NASM.
In it, the array "arrayW" holds three values.
We put the address into RSI, use it to get a word, then add 2 to
RSI to advance the address to the next word.
mov rsi, arrayW
mov ax, [rsi] ; get the first word
add rsi, 2 ; increment the "pointer"
The first program outputs 24576 as the sum.
This does not look right, but it actually is.
The second program gives the output in hexadecimal, which makes the
sum easy to verify.
A third program shows a variation.
Instead of putting the address of arrayW into RSI, and using RSI to
access the value, it uses RSI as an index.
RSI starts with a value of 0, and we access "[arrayW + rsi]" to get the
value.
mov rsi, 0
mov ax, [arrayW + rsi] ; get the first word (different way to access)
add rsi, 2 ; increment the "pointer"
Actually, the comment is not updated; it should say that it is incrementing
the "index". Also, an increment is +1, though here it is more conceptual,
in that it advances the index to the next value.
Duplicate label as a "pointer variable"
Page 62 of the chapter 4 slides shows "ptrW" used as a pointer to
"arrayW".
Here (
array_sum_pointer_v2.txt)
is a log showing a program that does this.
The first version contains this:
arrayW: dw 1000h, 2000h, 3000h
; The next line does not work. Assembler expects more.
ptrW: arrayW
As the comment says, this does not work, and the assembler generates
an error based on the "ptrW" line.
The slide includes "DWORD", but that does not help here.
We can try this:
arrayW: dw 1000h, 2000h, 3000h
; The next line does not work. Assembler expects more.
;ptrW: arrayW
ptrW: dq arrayW
However, using "dq arrayW" does not quite work either.
The dq may say "ptrW starts here and has 64 bits",
i.e. not create a duplicate pointer to arrayW.
What it does is put the address of arrayW in memory here.
In other words, "ptrW" is not equivalent to "arrayW",
but "[ptrW]" is equivalent to "arrayW".
While the program assembles, links, and runs,
the sum's result is obviously wrong.
The output contains the addresses, and those show that
"arrayW" and "ptrW" do not share the same address,
which is what this example is supposed to show.
It is close to a solution, though, and we'll revisit it in a minute.
The next example shows a simple solution.
ptrW: ; This works.
arrayW: dw 1000h, 2000h, 3000h
This defines the label "ptrW" but does not have anything there.
The next line defines the label "arrayW" and includes the array values.
Thus, both labels point to the same memory location.
Finally, we revisit the second version.
[mweeks@gsuad.gsu.edu@snowball ~]$ diff array_sum4b.asm array_sum4d.asm
1,2c1,2
< ; Assemble: nasm -f elf64 array_sum4b.asm
< ; Link: gcc array_sum4b.o -o array_sum4b
---
> ; Assemble: nasm -f elf64 array_sum4d.asm
> ; Link: gcc array_sum4d.o -o array_sum4d
80c80
< mov rsi, ptrW
---
> mov rsi, [ptrW]
[mweeks@gsuad.gsu.edu@snowball ~]$
The "diff" command is good to use here, where we want to know the differences
between the two files. As you can see, the difference is that
the array_sum4d.asm version puts square brackets around ptrW, so that it
gets the value stored there. We can conclude that the value is in fact
the address of arrayW.
The LOOP instruction
Slide 66 shows the LOOP instruction.
The example is to find the sum of an array.
The first attempt at using it (
array_sum5.txt)
shows that the sum is not correct. It also outputs the indices
and the values stored. These pieces of information help identify the
problem. The indices are correct, but the array values shown are not.
Closer inspection reveals that the array values shown are a mix of the
values defined in the array. This points to a problem with accessing the
data on the correct boundaries.
In other words, we see values printed are a combination of array values.
A second example is shown that does things differently. One major
change is that the array values are 8 bits instead of 16.
The loop instruction works with the C register, and automatically
decrements it, but then accessing the array values based on it
becomes a problem when working with larger values than bytes.
A second issue is that we are accessing the values in reverse order.
This may or may not be aproblem, depending on what you are trying to do.
Finding a sum is not problem, since the result will be the same
regardless of whether we access the values in ascending or descending order.
With the changes indicated, the second example does show a LOOP command
with array accesses.
-Michael Weeks, Feb 29, 2024