Introduction

Page Contents (hide)

Assembly Syntax
Sample Program
Program Translation
Program Memory

Programs written in assembly language are machine dependent and can only be used on a specific processor. They have little to no structure, and consists of far more lines of code than high-level language programs. To control a processor, you must speak it's language. The language of the computer is it's machine language and it's vocabulary is it's instructions. While no two machine languages are the same, they are quite similar since they are all constructed using similar underlying hardware principles.

Assembly Syntax

An assembly language program consists of a collection of basic instructions and assembler directives. There are no constructs like loops and selection statements commonly found in high-level language programs. Instead, you must use the primitive instructions of the processor to implement any high-level language constructs that are needed.

All assembly languages follow a similar pattern in terms of their syntax and program structure. Some may have specific requirements based on the given architecture, but once you learn one assembly language, others are easy to understand. In this section, we begin an introduction to the MIPS assembly language by looking at the structure and syntax of a simple program.

Assembler Directives

Every MIPS program must contain a text segment, which contains the executable instructions. The text segment is indicated using the assembler directive

.text

Assembly languages use assembler directives to provide information as the program is translated to machine code. In MIPS assembly, directives are specified using dot notation, an identifier that begins with a period.

Statements

The statements within an assembly language program are written one per line with each statement being comprised of three parts:

label operation operands

The operation is the symbolic representation of the machine code operation to be performed and the operands are the arguments to that operation. The number and type of operands depends on the specific instruction. At most, there will be three operands, with each separated by a comma:

add $t0, $t1, $t2

while some instructions may take a single operand

j start

In MIPS, the operands can be either a register, a label, or an immediate value. The actual type depends on the specific instruction. In the above examples, the add operation requires three registers, while the j instruction requires either a label or an immediate integer value.

Immediate Values

In assembly language, literal values are known as immediate values. Immediate values can be specified as arguments for some instructions, but it depends on the specific instruction. Here, the li instruction (load immediate) is used to store an immediate value into a register

li $t1, 50

Immediate integer values can be specified in assembly using decimal, hexadecimal or octal notation.

Labels

Assembly languages do not use variables or functions. Instead, labels are used to identify and name memory locations that hold instructions or data. The label part of a statement is optional as shown above or may be included if the given location in memory needs to be referenced within the program:

main: li $t1, 50

A label alone may be provided on a line, but it always refers to the location of the next statement or word in memory:

main:
li $t1, 50

A label is a valid identifier that ends with a colon (:). The rules for naming an identifier are similar to identifiers in a high-level language. A label can be comprised of alphanumeric characters including the underscore (_), but the first character can not be a digit. The names of instructions and assembler directives are reserved and can not be used as identifiers.

Comments

Comments can be specified in an assembly language program using the hash symbol (#). Like in Python, everything from the # to the end of the line is considered a comment.

# This is an example that illustrates a comment in MIPS.

Sample Program

To illustrate the style and syntax of an assembly language program, consider the following simple program which adds two numbers, both of which are stored in registers, and stores the result in a third register:

# example1.asm
# This is an example that illustrates the various components of
# an assembly language program.
#
.text
# Program code goes in the text segment. This is an
# example of an assembler directive.

main:
# Initialize two registers and add their values
li $t1, 50
li $t2, 18
add $t0, $t1, $t1

# Terminate the program
li $v0, 10
syscall

Anassembly language program is written in a text file in the same way that you would write a program in any high-level language. The .asm file extension is used for assembly language programs and modules.

Structure

MIPS assembly is a free-flowing type language, which means that it does not require the labels, instructions or directives to be placed at any specific location. By tradition, however, labels start at the left most position or column 1, and assembler directives and instructions are indented in from the left. This allows you to quickly find specific labels as you scan the lines of code. In this example, the directive is indented over 11 spaces and the add instruction, 8 spaces.

Initiation

Every MIPS program must contain a main: label to indicate the starting point of execution. After the program is loaded into memory, execution will begin with the first statement following the main: label. Technically, it can be placed anywhere, but good design calls for it be included at the top of the assembly file.

Termination

In a high-level language, program termination is handled automatically when the last statement in the module is executed (as with Python) or when the main function returns (as with C and Java). In assembly language, there is no magic or automatic steps performed. To end or exit a program, we must terminate it using a system call. In MIPS, this is done using a two statements as shown at the end of the sample program

li $v0, 10 # These two lines serve as a halt statement.
syscall # More on this later - just use them for now.

These lines are needed in every MIPS assembly language program. We will cover the actions of these two statements in a later section. For now, simply use them at the end of each program in order to correctly terminate the program.

Program Translation

The instructions in an assembly language program must be translated into the bit patterns of the underlying machine code instructions. This translation is done using an assembler.

Assembly languages typically allow labels to be used before they have been defined, unlike high-level programming languages. Thus, the translation process requires two passes

Translate each assembly statement into a valid machine code instruction, with temporary addresses.
Replace temporary addresses with the actual addresses after all of the labels have been defined.

The assembler produces object code or a file called an object file. This file contains the machine instructions, data and bookkeeping information. An object file can not be executed because it may reference routines or data in other files that have not yet been assembled. This is known as an unresolved reference. An object file on UNIX systems contain six distinct sections

The object file header describes the size and position of the other parts since those can vary depending on the contents of the source file.

The text segment contains the machine language code for functions in the source file. This part may not be complete if functions defined outside of the module are called. In that case, there will be unresolved references to those functions.

The data segment contains a binary representation of the global data defined in the source file. This part is not a complete data segment for the program, but only for the given module.

The relocation information identifies instructions and data that depend on absolute addresses. These must be changed or relocated to reflect their true locations when the complete program is constructed.

The symbol table part contains a list of labels used in the source file along with their corresponding addresses. This information is used to resolve addresses within the given module and other modules when the final program is constructed.

Finally, the debugging information part contains information that can be used by a debugger, which can be used by a programmer to debug large programs.

The assembler must translate each file or module in a program separately. During this process, it can only determine the addresses of the local labels, those used within the file itself. A second program called a linker is used to combine the collection of object files into an executable file. In doing so, it resolves the addresses of labels that are referred to in other object files. This is known as resolving external references.

The use of a linker allows a program to be split into pieces that are stored in different files. Each file contains a logically related collection of subroutines and data structures that form a module in a larger program.

Program Memory

If all references can be resolved, the linker will produce an executable program containing machine code instructions. Before it can be executed, however, it must be read from the external storage and loaded into memory. A third program, called a loader is used to load a program into memory, set the machine registers, and initialize the program counter. The loader is part of the operating system.

When a program is loaded into memory, the operating system allocates it a specific amount of memory that comprises the programs address space. A program's memory is divided into three segments, with each segment having a specific purpose. While the layout of these segments can vary from one system to the next, modern computers use a layout similar to that shown in Figure 1.

Figure 1. Common layout for a program's memory.

The text segment near the bottom holds the program's executable instructions.

The data segment, which is just above the text segment, is comprised of two parts. The static data part contains data that is statically allocated during compile or assembly time. This is the storage area for global variables, the data that exist during the entire lifetime of the program execution. The dynamic data part, also referred to as the heap, contains data that is allocated during execution of the program. This is the area where dynamic variables and the instance variables of objects are allocated as the program executes.

The stack segment, which resides at the top of the program's address space, contains the program stack that is used to store local variables and to manage function calls. The heap and the stack both grow and shrink as the program executes. As "dynamic memory" is allocated, the heap expands upwards and as data is pushed on to the stack, it expands downwards. This organization of memory allows the two expandable segments to be placed as far apart as possible so they can grow to use the program's entire address space. If a program uses its entire memory, the dynamic data and stack segment will collide, resulting in a stack-heap collision error.

The reserved area at the low end of memory contains instructions that allow programs to make system calls in order to utilize system resources. We will discuss the use of this area in more detail later in the chapter.

Modern operating systems allocate virtual memory to a program, one block or page at a time as needed. The blocks of virtual memory can be placed anywhere within the physical memory of a computer. This allows operating systems to more easily manage the execution of multiple applications at the same time by one or more users. In a later chapter, we will explore the construction and use of physical memory and the role of the operating system in the allocation of memory.

Chapter 3. MIPS Assembly Language

Integer Arithmetic