We are going to examine how to get input. We will need a place to store the input read, and here create a buffer of 10 bytes. The 10 is just an example, and in fact the code does not even use that much. A more realistic number is 1000, or even 10000, but whatever number we choose, it might not be enough. The .bss section is for unitialized data. The directive "resb" means reserve. Thus, the following lines reserve 10 bytes for use with the program.
section .bss ; Uninit data section
mybuffer: resb 10 ; reserve 10 bytes
There is a post at
codereview.stackexchange.com
that shows a nice example of using "equ" to define values, a lot like
constant definitions. Actually, these are like the pre-processor
"#define" directives in the C language, since the assembler will replace
instances of the label with the equivalent number.
This goes before the .text section.
SYS_READ equ 0 ; read text from stdin
SYS_WRITE equ 1 ; write text to stdout
STDIN equ 0 ; standard input
STDOUT equ 1 ; standard output
The STDIN and STDOUT stand for standard input and standard output,
respectively. These are the defaults for input and output, but can be
"piped" or re-directed when invoking the program from a shell.
You are familiar with using a shell, even if you have not heard of this
term: when you connect with the SNOWBALL server and type commands at
the prompt, you are interacting with a shell.
As you may have guessed, SYS_READ and SYS_WRITE are system read and write calls. We will use the "syscall" command to make system calls. You may remember seeing commands like "int 0x80", which is the old way (i.e. 32 bit) of doing system calls. Note that the SYS_READ and SYS_WRITE defined here are simply ways to make the code readable. The assembly program should move values into certain registers before the "syscall" command, and the A register should have the value corresponding to the desired system call. There are many others besides SYS_READ and SYS_WRITE, such as SYS_OPEN, SYS_CLOSE, and SYS_CHDIR, to name a few.
It is a good idea to push the RBP register at the beginning, then pop it from the stack at the end. The following lines are equilavent to the "enter" command that you may have seen in other x86 code.
main:
push rbp ; remember RBP
mov rbp, rsp
The following lines are equilavent to the "leave" command that you
may have seen in other x86 code.
mov rsp, rbp ; restore RBP
pop rbp ; same as "leave" op
ret
You might want to add other things after "main:" or before "ret",
but remembering RBP and restoring it are some of the first and last
things that your program should do.
We can use the following to read a single character from STDIN. This is not as efficient as reading multiple characters. Here we use "fd" for "file descriptor". If we were to read from a file (or from a pipe), we would refer to the input source by the file descriptor. We also use a file descriptor to refer to the output destination. The file descriptor goes into the RAX register.
The file descriptor goes in register RDI, while the address of the buffer for the characters read goes in RSI. A buffer is simply an array. Remember that we defined "mybuffer" as an array of bytes in the .bss section. The number of characters to read goes into RDX. The program will try to read that many, but it is possible that the number actually read is fewer, i.e. if we are at the end of file.
mov rsi, mybuffer ; where to store chars read
mov rdi, STDIN ; fd of input
mov rdx, 1 ; characters to read
mov rax, SYS_READ ; function to call
syscall
After the syscall, the A register will contain the number of characters
read.
We can store this at the label "temp", defined in the data section
(or the bss section).
Next, compare the number read to 0 with the "cmp" command.
If the result is equal, i.e. the number read is zero, jump to the
label "eof_reached", which we define later in the program.
"EOF" is short for "end of file", and simply means the end of the input,
whether the input is a traditional file or not.
Otherwise, jump to the label "read_again".
mov [temp], eax ; number of chars read
cmp eax, 0
je eof_reached ; Did we read 0 chars?
; if we read 0 chars, we got EOF
jmp read_again ; Read another character
The label "read_again" should be defined before the code that reads
a character.
You will need to define "eof_reached", and put some code there. If nothing else, it should load an integer into the A register, followed with a return command.
Put all of this together, along with anything else that you need to make a fully functioning program. Show the program, and that it assembles and runs. When you run it, it will expect input from the keyboard, so type something in (such as your name), and press return. Press the CTRL-D key (that is, hold down the control key and press the D key) to indicate the end of your input.
mov rsi, mybuffer ; buffer (char *)
mov rdi, STDOUT ; file descriptor (fd)
mov rdx, 1 ; count (length)
mov rax, SYS_WRITE
syscall
The count of characters to print goes into the D register. Here we will use 1,
to print a single character, though it is more efficient to work with
a larger amount. The RSI register gets the buffer's address.
The RDI register stores the file descriptor, and we use the STDOUT value
that we defined above. The A register contains the action to perform,
here it uses the SYS_WRITE value. Note that the order that we put these
values into these registers does not matter, as long as the registers
have the correct values when the computer reaches the "syscall" command.
If we read in N characters, we would repeat the commands N times. Of course, we could also set the count (D) to N instead. However we do it, we will not end up with a program that reads in all input then writes all of the output. This is because we do not know ahead of time exactly how much input there will be. (There can be exceptions. For example, if we are dealing with files, we can determine the file size, dynamically allocate just enough memory to hold it all, then read the file in all at once. But this does not guarantee that the program will be efficient, such as if the file size exceeds available RAM.) In general, we will not know the size of the input in advance. Instead, the program will read in some input, write the output, then repeat the process until we reach the end of the input. The only decision that we will make is how much space to dedicate to the buffer in advance. If you do work with more than 1 character at a time, be aware that the count used for output should be the number of characters actually read, not the number that you were expecting to read.
To make this work with the code from part 1, the code performing the write should be done between the check for EOF and the jump to "read_again". Like before, show your program for part 2, assemble and link it, and show that it works.
One way is to pipe the output from one command to the program. The next two examples do this.
echo "abc" | ./lab8_pt2
The "echo" command normally echoes the string to the output, so a command
like echo "abc"
simply prints "abc". In the example above,
the string "abc" is piped to the lab8_pt2 program. After the last character
("c") is reached, the program should attempt another read,
find that 0 characters were read, and quit.
Show that this works.
The second example uses the "cat" command, and it will output the contents of "testfile". Here, however, it sends the output from the cat command to the lab8_pt2 as the input.
cat testfile | ./lab8_pt2
Show that this works.
Next, we have examples of redirection using files. First, the following command says to use "testfile" as the input to "lab8_pt2".
./lab8_pt2 < testfile
Verify that this works. It should appear to be the same as the "cat" example
from earlier.
A second file redirection is as follows.
./lab8_pt2 > testout
When you run that, it will send the output from "lab8_pt2" to "testout",
overwriting the "testout" file if it already exists. However, the
program expects input, and this command does not specify an alternate
input source. So it will still expect you to type using the keyboard
until it gets CTRL-D.
Therefore, anything that you type, before the CTRL-D, will be stored
in the "testout" file.
Try this, and show that it works.
Use "cat" on "testout" to verify it.
Now we can put the input redirection and output redirection together:
./lab8_pt2 < testfile > testout
This causes "testfile" to be the input, and "testout" receives the output.
When done, the two files should be the same. Try it, and verify this
with the "diff" command.
You might wonder why we have gone to so much trouble making a program that simply echos the input to the output. The idea here is to give you some experience with a program that can read input and write output. You could easily add to this program to do something more interesting, such as filtering out non-ASCII characters from a file, or automatically capitalizing a file's contents, or automatically making a file's contents lower case.
QuestionsIn this lab, we have learned: