Lab 8 -- Input and Output

Part 1

We are going to examine how to get input. We will need a place to store the input read, and here create a buffer of 10 bytes. The 10 is just an example, and in fact the code does not even use that much. A more realistic number is 1000, or even 10000, but whatever number we choose, it might not be enough. The .bss section is for unitialized data. The directive "resb" means reserve. Thus, the following lines reserve 10 bytes for use with the program.


        section .bss        ; Uninit data section
    mybuffer:  resb    10   ; reserve 10 bytes
There is a post at codereview.stackexchange.com that shows a nice example of using "equ" to define values, a lot like constant definitions. Actually, these are like the pre-processor "#define" directives in the C language, since the assembler will replace instances of the label with the equivalent number. This goes before the .text section.

        SYS_READ   equ     0          ; read text from stdin
        SYS_WRITE  equ     1          ; write text to stdout
        STDIN      equ     0          ; standard input
        STDOUT     equ     1          ; standard output
The STDIN and STDOUT stand for standard input and standard output, respectively. These are the defaults for input and output, but can be "piped" or re-directed when invoking the program from a shell. You are familiar with using a shell, even if you have not heard of this term: when you connect with the SNOWBALL server and type commands at the prompt, you are interacting with a shell.

As you may have guessed, SYS_READ and SYS_WRITE are system read and write calls. We will use the "syscall" command to make system calls. You may remember seeing commands like "int 0x80", which is the old way (i.e. 32 bit) of doing system calls. Note that the SYS_READ and SYS_WRITE defined here are simply ways to make the code readable. The assembly program should move values into certain registers before the "syscall" command, and the A register should have the value corresponding to the desired system call. There are many others besides SYS_READ and SYS_WRITE, such as SYS_OPEN, SYS_CLOSE, and SYS_CHDIR, to name a few.

It is a good idea to push the RBP register at the beginning, then pop it from the stack at the end. The following lines are equilavent to the "enter" command that you may have seen in other x86 code.


    main:
        push    rbp                 ; remember RBP
        mov     rbp, rsp
The following lines are equilavent to the "leave" command that you may have seen in other x86 code.

        mov     rsp, rbp            ; restore RBP
        pop     rbp                 ; same as "leave" op
        ret
You might want to add other things after "main:" or before "ret", but remembering RBP and restoring it are some of the first and last things that your program should do.

We can use the following to read a single character from STDIN. This is not as efficient as reading multiple characters. Here we use "fd" for "file descriptor". If we were to read from a file (or from a pipe), we would refer to the input source by the file descriptor. We also use a file descriptor to refer to the output destination. The file descriptor goes into the RAX register.

The file descriptor goes in register RDI, while the address of the buffer for the characters read goes in RSI. A buffer is simply an array. Remember that we defined "mybuffer" as an array of bytes in the .bss section. The number of characters to read goes into RDX. The program will try to read that many, but it is possible that the number actually read is fewer, i.e. if we are at the end of file.


        mov     rsi, mybuffer        ; where to store chars read
        mov     rdi, STDIN           ; fd of input
        mov     rdx, 1               ; characters to read
        mov     rax, SYS_READ        ; function to call
        syscall                      
After the syscall, the A register will contain the number of characters read. We can store this at the label "temp", defined in the data section (or the bss section). Next, compare the number read to 0 with the "cmp" command. If the result is equal, i.e. the number read is zero, jump to the label "eof_reached", which we define later in the program. "EOF" is short for "end of file", and simply means the end of the input, whether the input is a traditional file or not. Otherwise, jump to the label "read_again".

        mov     [temp], eax          ; number of chars read
        cmp     eax, 0
        je      eof_reached          ; Did we read 0 chars?
                                     ; if we read 0 chars, we got EOF
        jmp     read_again           ; Read another character
The label "read_again" should be defined before the code that reads a character.

You will need to define "eof_reached", and put some code there. If nothing else, it should load an integer into the A register, followed with a return command.

Put all of this together, along with anything else that you need to make a fully functioning program. Show the program, and that it assembles and runs. When you run it, it will expect input from the keyboard, so type something in (such as your name), and press return. Press the CTRL-D key (that is, hold down the control key and press the D key) to indicate the end of your input.

Part 2 - output

One problem with the code you have with part 1 is that it's hard to verify that it works. In this part, we will echo the characters read back to the STDOUT. Code to do this is as follows.

        mov     rsi, mybuffer        ; buffer (char *)
        mov     rdi, STDOUT          ; file descriptor (fd)
        mov     rdx, 1               ; count (length)
        mov     rax, SYS_WRITE
        syscall        
The count of characters to print goes into the D register. Here we will use 1, to print a single character, though it is more efficient to work with a larger amount. The RSI register gets the buffer's address. The RDI register stores the file descriptor, and we use the STDOUT value that we defined above. The A register contains the action to perform, here it uses the SYS_WRITE value. Note that the order that we put these values into these registers does not matter, as long as the registers have the correct values when the computer reaches the "syscall" command.

If we read in N characters, we would repeat the commands N times. Of course, we could also set the count (D) to N instead. However we do it, we will not end up with a program that reads in all input then writes all of the output. This is because we do not know ahead of time exactly how much input there will be. (There can be exceptions. For example, if we are dealing with files, we can determine the file size, dynamically allocate just enough memory to hold it all, then read the file in all at once. But this does not guarantee that the program will be efficient, such as if the file size exceeds available RAM.) In general, we will not know the size of the input in advance. Instead, the program will read in some input, write the output, then repeat the process until we reach the end of the input. The only decision that we will make is how much space to dedicate to the buffer in advance. If you do work with more than 1 character at a time, be aware that the count used for output should be the number of characters actually read, not the number that you were expecting to read.

To make this work with the code from part 1, the code performing the write should be done between the check for EOF and the jump to "read_again". Like before, show your program for part 2, assemble and link it, and show that it works.

Part 3 - I/O redirection and piping

The defaults for standard input and standard output are the keyboard and the terminal window. On a computer like SNOWBALL, running Linux/Unix, you can easily redirect input/output to/from a program. You can create a file of test input and use it with the program. Here is one to use; call it "testfile".

One way is to pipe the output from one command to the program. The next two examples do this.


    echo "abc" | ./lab8_pt2 
The "echo" command normally echoes the string to the output, so a command like echo "abc" simply prints "abc". In the example above, the string "abc" is piped to the lab8_pt2 program. After the last character ("c") is reached, the program should attempt another read, find that 0 characters were read, and quit. Show that this works.

The second example uses the "cat" command, and it will output the contents of "testfile". Here, however, it sends the output from the cat command to the lab8_pt2 as the input.


    cat testfile | ./lab8_pt2 
Show that this works.

Next, we have examples of redirection using files. First, the following command says to use "testfile" as the input to "lab8_pt2".


    ./lab8_pt2 < testfile
Verify that this works. It should appear to be the same as the "cat" example from earlier.

A second file redirection is as follows.


    ./lab8_pt2 > testout
When you run that, it will send the output from "lab8_pt2" to "testout", overwriting the "testout" file if it already exists. However, the program expects input, and this command does not specify an alternate input source. So it will still expect you to type using the keyboard until it gets CTRL-D. Therefore, anything that you type, before the CTRL-D, will be stored in the "testout" file. Try this, and show that it works. Use "cat" on "testout" to verify it.

Now we can put the input redirection and output redirection together:


    ./lab8_pt2 < testfile > testout
This causes "testfile" to be the input, and "testout" receives the output. When done, the two files should be the same. Try it, and verify this with the "diff" command.

You might wonder why we have gone to so much trouble making a program that simply echos the input to the output. The idea here is to give you some experience with a program that can read input and write output. You could easily add to this program to do something more interesting, such as filtering out non-ASCII characters from a file, or automatically capitalizing a file's contents, or automatically making a file's contents lower case.

Questions

In this lab, we have learned: