This lab uses the SNOWBALL server. You will need to log in and create a log like you did in previous labs. Remember to turn in a version of the log that has the control characters removed. Also, there are prompts/questions to answer in bold.
For this lab, we will use the NASM assembler. NASM is the "Netwide ASseMbler" for x86 CPUs. You will notice that the code is a bit different from the previous labs, and you will likely find it to be a bit simpler. From this point forward, we will use the NASM assembler for this course unless specifically stated otherwise.
First, we will use the semi-colon for comments. The code from gcc contains a lot of directives. Let's compare the "hello world" program from the gcc compiler with the "hello world" program from University of Maryland, Baltimore County, specifically the "hello_64.asm" listing.
Here are some points to observe.
gcc version | NASM version | Comment |
.section .rodata | section .data | The data section |
.string "hello world." | db "Hello world", 0 | Defining a string |
.text | section .text | The code section |
.globl main | global main | Where the code actually starts |
pushq %rbp | push rbp | Put the stack frame reference on the stack. We are going to change rbp. |
movl $.LC0, %edi | mov rdi,fmt | di holds a pointer to the start of the format string |
movq %rsi, -16(%rbp) | mov rsi,msg | si holds a pointer to the start of the string to print |
call puts | call printf | Call the function that prints the text |
movl $0, %eax | mov rax,0 | Put the value 0 into the A register |
leave | pop rbp | Restore the rbp value from the start. |
Remember that rdi and edi refer to the same register, but that rdi specifies 64 bits while edi specifies 32. The same is true for rsp and esp, rax and eax, rbx and ebx, etc. Register rdi is a register for the destination index, that is, for strings and other arrays. Register rsi is a register for the source index, again for arrays. The x86 CPU uses segment registers (here is a good explanation) as well index registers to specify a memory location. In other words, an address for the x86 needs one register to point to where the memory segment starts, and another one to point to the offset. In the pre-386 days, the segment register would hold the "upper 4 to 16 bits of the address" [source] so that you could address memory beyond the size of the registers.
The 0 after "Hello world" is a convention for strings. We know the start of the string, specified with a label. In non-object oriented languages, how do you know the length or end of a string? One way is to encode the end of the string with a special character, and here it is the NUL character (0). Given the start of a string, any function can then iterate over the string's characters until it comes to a 0, and it then know that the string's end has been reached.
Both programs use "main:" as a label, defining where the code starts. The rbp register holds the stack frame, and is pushed onto the stack as the first command in both programs.
Notice how the source and destinations are different. Under the gcc assembly, we have "command source, destination" (e.g. movl $0, %eax), while under nasm we have "command destination, source" (e.g. mov rax, 0). Questions to answer: what do "movl $0, %eax" (gcc) and "mov rax, 0" (nasm) mean? What are the results of these commands?
Both "puts" and "printf" are functions to send output to stdout. It's interesting to note that the original lab1.c program specified printf, but that the compiler generated code to call puts instead. Changes like this happen when a compiler optimizes our code for us; the result may in fact be more efficient. However, as the programmer you are responsible for your code. If there is some obscure bug on your particular system in puts but not in printf, looking at the higher-level language (HLL) code you might conclude "it cannot be that because my code uses printf". A bug that hides at the HLL level does not hide at the assembly language level.
Compiling the gcc version is done with "gcc -c lab1.s", then linking it is done with "gcc lab1.o -o lab1". Compiling the NASM version is done with "nasm -f elf64 hello_64.asm", then linking it with "gcc hello_64.o -o hello_64". Using "-f elf64" specifies the file format for the output. On SNOWBALL, omitting the "-f elf64" will generate some errors. There is an optional "-l hello_64.lst" that creates a "listing" file, with both the assembly language instructions and the machine language results.
; Assemble: nasm -f elf64 AddTwoSum_64.asm
; Link: gcc AddTwoSum_64.o -o AddTwoSum_64
; AddTwoSum_64.asm - Chapter 3 example.
; See http://www.asmirvine.com/gettingStartedVS2019/index.htm
; This is adapted for NASM.
section .data ; Data section, initialized variables
sum: dq 0
section .text
global main
main:
mov rax, 5
add rax, 6
mov [sum], rax
mov rax, 0
ret
You can download this here.
Once you have a copy on SNOWBALL, use the "nasm" command to assemble it.
Then use the "gcc" command to link it.
Then run it.
Describe what this program does from the "main:" label to the end.
What do you observe when you run it? Does the program work?
Now let's look at an expanded version.
; Assemble: nasm -f elf64 AddTwoSum_64_pt2.asm
; Link: gcc AddTwoSum_64_pt2.o -o AddTwoSum_64_pt2
; Based on AddTwoSum_64.asm (by Kip Irvine)
; This is adapted for NASM.
extern printf ; We will use this external function
section .data ; Data section, initialized variables
mystr: db "%d", 10, 0 ; String format to use (decimal), followed by NL
sum: dq 0
section .text
global main
main:
mov rax,5
add rax,6
mov [sum], rax
; Now print the result out
mov rdi, mystr ; Format of the string to print
mov rsi, [sum] ; Value to print
mov rax, 0
call printf
mov rax, 0
ret
You can download this here.
Once you have a copy on SNOWBALL, use the "nasm" command to assemble it.
Then use the "gcc" command to link it.
Then run it.
What do you observe? Does the program work?
What does this program do differently from the first one? (Describe
what the assembly language commands do.)
Compile this again, only this time use
"nasm -f elf64 -l AddTwoSum_64_pt2.lst AddTwoSum_64_pt2.asm".
Use "cat" to show the AddTwoSum_64_pt2.lst file.
What do you observe in the file?
Run the command "xxd AddTwoSum_64_pt2" (which creates a hexadecimal dump
of the file's contents). What do you observe there, and how does it
relate to the .lst file? (Hint: look for the values B8 in the AddTwoSum_64_pt2.lst and b8 in the xxd output.)
Now we'll see a slightly different version, called AddTwoSum_64_pt3.asm. Download it, put it on SNOWBALL, and use the "nasm" command to assemble it then gcc to link it. Then run it. Do you observe any differences between this and AddTwoSum_64_pt2.asm? Use the "diff" command to show the differences between them, then explain what they are.
Now run AddTwoSum_64_pt2, and then enter the command
echo $?
Next, run AddTwoSum_64_pt3,
and then enter the command
echo $?
What do you observe about the output from these two commands?
Look up what a "return value" value is under Unix/Linux,
describe what it is, and say how it relates to this lab.
Be sure to document where you got your answer, and use double-quotes
for anything you do not say yourself.
In this lab, we have: