Lab 4 -- Introduction to NASM, along with add and call

This lab uses the SNOWBALL server. You will need to log in and create a log like you did in previous labs. Remember to turn in a version of the log that has the control characters removed. Also, there are prompts/questions to answer in bold.

For this lab, we will use the NASM assembler. NASM is the "Netwide ASseMbler" for x86 CPUs. You will notice that the code is a bit different from the previous labs, and you will likely find it to be a bit simpler. From this point forward, we will use the NASM assembler for this course unless specifically stated otherwise.

First, we will use the semi-colon for comments. The code from gcc contains a lot of directives. Let's compare the "hello world" program from the gcc compiler with the "hello world" program from University of Maryland, Baltimore County, specifically the "hello_64.asm" listing.

Here are some points to observe.

gcc versionNASM versionComment
.section .rodatasection .dataThe data section
.string "hello world."db "Hello world", 0Defining a string
.textsection .textThe code section
.globl mainglobal mainWhere the code actually starts
pushq %rbppush rbpPut the stack frame reference on the stack.
We are going to change rbp.
movl $.LC0, %edimov rdi,fmtdi holds a pointer to the start of the format string
movq %rsi, -16(%rbp)mov rsi,msgsi holds a pointer to the start of the string to print
call putscall printfCall the function that prints the text
movl $0, %eaxmov rax,0Put the value 0 into the A register
leavepop rbpRestore the rbp value from the start.

Remember that rdi and edi refer to the same register, but that rdi specifies 64 bits while edi specifies 32. The same is true for rsp and esp, rax and eax, rbx and ebx, etc. Register rdi is a register for the destination index, that is, for strings and other arrays. Register rsi is a register for the source index, again for arrays. The x86 CPU uses segment registers (here is a good explanation) as well index registers to specify a memory location. In other words, an address for the x86 needs one register to point to where the memory segment starts, and another one to point to the offset. In the pre-386 days, the segment register would hold the "upper 4 to 16 bits of the address" [source] so that you could address memory beyond the size of the registers.

The 0 after "Hello world" is a convention for strings. We know the start of the string, specified with a label. In non-object oriented languages, how do you know the length or end of a string? One way is to encode the end of the string with a special character, and here it is the NUL character (0). Given the start of a string, any function can then iterate over the string's characters until it comes to a 0, and it then know that the string's end has been reached.

Both programs use "main:" as a label, defining where the code starts. The rbp register holds the stack frame, and is pushed onto the stack as the first command in both programs.

Notice how the source and destinations are different. Under the gcc assembly, we have "command source, destination" (e.g. movl $0, %eax), while under nasm we have "command destination, source" (e.g. mov rax, 0). Questions to answer: what do "movl $0, %eax" (gcc) and "mov rax, 0" (nasm) mean? What are the results of these commands?

Both "puts" and "printf" are functions to send output to stdout. It's interesting to note that the original lab1.c program specified printf, but that the compiler generated code to call puts instead. Changes like this happen when a compiler optimizes our code for us; the result may in fact be more efficient. However, as the programmer you are responsible for your code. If there is some obscure bug on your particular system in puts but not in printf, looking at the higher-level language (HLL) code you might conclude "it cannot be that because my code uses printf". A bug that hides at the HLL level does not hide at the assembly language level.

Compiling the gcc version is done with "gcc -c lab1.s", then linking it is done with "gcc lab1.o -o lab1". Compiling the NASM version is done with "nasm -f elf64 hello_64.asm", then linking it with "gcc hello_64.o -o hello_64". Using "-f elf64" specifies the file format for the output. On SNOWBALL, omitting the "-f elf64" will generate some errors. There is an optional "-l hello_64.lst" that creates a "listing" file, with both the assembly language instructions and the machine language results.

Writing a program for NASM

Here is an example program from your textbook. It has been adapted to work with NASM.

; Assemble:	  nasm -f elf64 AddTwoSum_64.asm
; Link:		  gcc AddTwoSum_64.o -o AddTwoSum_64

; AddTwoSum_64.asm - Chapter 3 example.
; See http://www.asmirvine.com/gettingStartedVS2019/index.htm
; This is adapted for NASM.

    section .data       ; Data section, initialized variables
sum: dq 0

    section .text
    global main
main:
   mov  rax, 5
   add  rax, 6
   mov  [sum], rax

   mov  rax, 0
   ret
You can download this here. Once you have a copy on SNOWBALL, use the "nasm" command to assemble it. Then use the "gcc" command to link it. Then run it. Describe what this program does from the "main:" label to the end. What do you observe when you run it? Does the program work?

Part 2

Now let's look at an expanded version.


; Assemble:	  nasm -f elf64 AddTwoSum_64_pt2.asm
; Link:		  gcc AddTwoSum_64_pt2.o -o AddTwoSum_64_pt2

; Based on AddTwoSum_64.asm (by Kip Irvine)
; This is adapted for NASM.

    extern  printf      ; We will use this external function

    section .data       ; Data section, initialized variables

mystr: db "%d", 10, 0   ; String format to use (decimal), followed by NL

sum: dq 0

    section .text
    global main
main:
   mov  rax,5
   add  rax,6
   mov  [sum], rax

                      ; Now print the result out
   mov   rdi, mystr   ; Format of the string to print
   mov   rsi, [sum]   ; Value to print
   mov   rax, 0
   call  printf

   mov  rax, 0
   ret
You can download this here. Once you have a copy on SNOWBALL, use the "nasm" command to assemble it. Then use the "gcc" command to link it. Then run it. What do you observe? Does the program work? What does this program do differently from the first one? (Describe what the assembly language commands do.) Compile this again, only this time use "nasm -f elf64 -l AddTwoSum_64_pt2.lst AddTwoSum_64_pt2.asm". Use "cat" to show the AddTwoSum_64_pt2.lst file. What do you observe in the file? Run the command "xxd AddTwoSum_64_pt2" (which creates a hexadecimal dump of the file's contents). What do you observe there, and how does it relate to the .lst file? (Hint: look for the values B8 in the AddTwoSum_64_pt2.lst and b8 in the xxd output.)

Part 3

Now we'll see a slightly different version, called AddTwoSum_64_pt3.asm. Download it, put it on SNOWBALL, and use the "nasm" command to assemble it then gcc to link it. Then run it. Do you observe any differences between this and AddTwoSum_64_pt2.asm? Use the "diff" command to show the differences between them, then explain what they are.

Now run AddTwoSum_64_pt2, and then enter the command


    echo $?
Next, run AddTwoSum_64_pt3, and then enter the command

    echo $?
What do you observe about the output from these two commands? Look up what a "return value" value is under Unix/Linux, describe what it is, and say how it relates to this lab. Be sure to document where you got your answer, and use double-quotes for anything you do not say yourself.

In this lab, we have: