Lab 6

Lab 6 -- macros and subroutines

Have you noticed how we often need to repeat the same code again and again? For example, we might want to print something out. To print a value, we define a format string and data value (i.e. "sum") in the data section.


    mystr: db "%d", 10, 0   ; String format to use (decimal), followed by NL
    sum:   dq 0

Then in the code section, we refer to the string and the value to print as follows.


                      ; Now print the result out
   mov   rdi, mystr   ; Format of the string to print
   mov   rsi, [sum]   ; Value to print
   mov   rax, 0
   call  printf

Thus to print something out, we use a set of commands like those above to do this. Then we might want to print another value, so we repeat the same code with a minor variation. Later, we print a third value, and repeat the same code again with a minor variation. Isn't there a way to make this easier?

There are a couple of ways, and that is the subject of this lab. First, we will define and use a macro in part 1. Then we will define and use a subroutine in part 2.

Part 1

A macro is a kind of short-hand notation that the assembler (technically, the pre-processor) will process. This exists in higher level languages, too, such as the "#define" directive in C. It works like a smart find-and-replace. When the macro is found, it is "expanded" into whatever the programmer indicated. This is perhaps best explained with an example. Consider this example.


    %macro  print 2 
    
                          ; Print arg2 using string arg1
       mov   rdi, %1      ; Format of the string to print
       mov   rsi, %2      ; Value to print
       mov   rax, 0
       call  printf

    %endmacro

The "%macro" and "%endmacro" delineate where this macro begins and ends. The macro has the name "print", and has 2 arguments. If you examine the code within the macro, you should reconize it as the commands that we have used to print a value. The only detail left to notice is that the macro contains "%1" and "%2", which correspond to the two arguments, respectively.

To use the macro, we put the macro's name, followed by the label of the format string and the memory location to print.


   print mystr, [sum]

What the assembler will do is "expand" this to the code defined in the macro, substituting "mystr" for "%1" and "[sum]" for "%2". Instead of typing out 4 or more lines to call printf, we just specify the one line with "print". Think of it like a global find-and-replace operation. Whether you use the macro or type out the equivalent lines, the result is the same.

If we have several values to print with different format strings, our code might look like this.


   print mystr1, [val1]
   print mystr2, [val2]
   print mystr3, [sum]

It should be easy to see that working with macros can save the programmer a lot of time and energy. It can also make the code easier to debug, since it contains a regular pattern. That is, imagine if we do not use a macro and instead type the commands for printf several times. And suppose that there is a subtle mistake in one of the commands, like switching rdi and rsi. Would you be able to spot the difference?

See this link for more information about macros.

Copy your code from the last lab, and replace any instances of calling printf with a macro as defined above. Call the result "lab6_pt1.asm". As with all labs, show the code (use "cat"), show the compilation, and that it runs.

Part 2

Another option for making repetitive code easier to use is the subroutine. Like a macro, a subroutine defines code that you can use again and again. However, a subroutine is a function, similar to a method in OOP. When you want to use a subroutine, you issue a call command to it, like the following.


   call mysubroutine

The subroutine must be defined in the code section. If you recall, all examples include a "ret" instruction as the last command. This returns control to whatever called your program, such as the OS shell. A subroutine is no different: it must end with a return instruction. When the computer calls the subroutine, it must remember where to come back to. It does this by pushing the current Instruction Pointer (IP) on the stack, then setting the IP to the subroutine's address. When the CPU gets to the return instruction, it pops the address from the stack and puts that in the IP.

We can create a subroutine for printing an integer and call it like this.


   call print_int

It could use a pre-defined format string, and if all that it prints is an integer, the string "%d" would work, defined in the data section. But this raises a question: how does it know what value to print? We would have to communicate the value somehow. A possible solution is to have a specific data value, defined with a label in the data section, then move the value to that location before calling "print_int". While this would work, you (the programmer) would need to remember which label to use for the move. Another solution is to use a register, such as A. Move the value to print to A, then call the subroutine. This helps with efficiency, especially if the value needs to be in a register in the subroutine.

The subroutine should be located after the main function's return. Make sure that "int_format" is defined in the data section. It could be the same as "mystr". The program should look like this:


   main:
       ; ... code goes here
       ; Put value to print in A register, if it is not there already
       mov   rax, [sum]
       call  print_int
       ; ... more code
       ret

   print_int:
       ; Instructions to print an int value go here
       mov   rdi, int_format  ; Format of the string to print
       mov   rsi, rax         ; Value to print
       mov   rax, 0
       call  printf
       ret

Copy your code from the previous lab (or part 1), and replace any instances of calling printf with a subroutine as defined above. Call the result "lab6_pt2.asm". As with all labs, show the code (use "cat"), show the compilation, and that it runs.

Questions:

Why is "int_format" better to use than "mystr" in the subroutine?
Why does a subroutine need a ret instruction, but a macro does not?
When you call a subroutine, how do you know if the registers will have the same values after it returns?
Suppose that it is important that your program remembers the value in register A after a subroutine call. What can you do outside of the subroutine to remember A's value?
Suppose that you write a subroutine that other people might use. Your subroutine uses (i.e. changes) the B register. When someone else uses your subroutine, they may have something important in B. What can you do inside of the subroutine so that B's value is the same upon return as it was when the subroutine started?
Does using a macro make a difference for the problem of remembering register values?
The command "mov rsi, %2" in the macro works fine when you invoke it with a command like "print mystr, [sum]". Suppose that you have the value in register A already, but you do not have it in memory, and use "print mystr, eax". Does it work? Why or why not?
Suppose that you have the value in register A already, but you do not have it in memory. If you use "print mystr, rax", does it work? Why or why not?
Using a macro, we can call it with "print mystr, [sum]". What if we use "print [mystr], [sum]" instead? Or "print mystr, sum"? Do these work? Why or why not?
Suppose that your subroutine pushes the C register on the stack, with something like "push ecx", but does not pop it off of the stack before the return. Will this work? Why or why not?
Suppose that your subroutine pushes the C register on the stack, with something like "push ecx". Later, it pops if off of the stack with "pop ebx". Will this work? Why or why not?
Which is approach (macro versus subroutine) is likely to generate a larger executable program, and why?

What we learned

You can define macros to make programming in assembly language easier
You can define subroutines to make programming in assembly language easier
How to communicate values to macros and/or subroutines
The stack is an important data structure, but you need to be careful when using it
How a reference using the square brackets, like "[sum]", is different from a reference without square brackets, like "mystr".