This is a cheatsheet for MIPS 32-bit, It worth mentioning that MIPS is a RISC (Reduced Instruction Set Computer) architecture with 32 general-purpose registers and 3 instruction formats which you will see in more detail.
MIPS architecture uses 32-bit memory addresses and 32-bit data words (4 bytes), note that the endianness of MIPS can be little or big-endian but we will talk about little-endian here regarding the data represented in memory.
Talking about memory! It’s important to know that the addresses of data read or written from/into memory should be word aligned (divisible by 4), now we have a good grasp of MIPS.
Before we dive in. It worth mentioning that most of the examples in this post is from Digital Design and Computer Architecture 2nd edition by David Harris and Sarah Harris which I highly recommend for those interested in computer architecture.
Registers
Name | Number | Use |
---|---|---|
$0 | 0 | constant 0 |
$at | 1 | assembler temporary |
$v0–$v1 | 2–3 | function return value |
$a0–$a3 | 4–7 | function arguments |
$t0–$t7 | 8–15 | temporary variables |
$s0–$s7 | 16–23 | saved variables |
$t8–$t9 | 24–25 | temporary variables |
$k0–$k1 | 26–27 | operating system (OS) temporaries |
$gp | 28 | global pointer |
$sp | 29 | stack pointer |
$fp | 30 | frame pointer |
$ra | 31 | function return address |
Instruction formats
There are 3 types/formats for registers in MIPS:
- R-Type
- I-Type
- J-Type
R-Type
op | rs | rt | rd | shamt | funct |
6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
- op: opcode/operation (equals 0 in R-type)
- rs: source register
- rt: second source register (since t comes after s in alphabetical order)
- rd: destination source
- shamt: shift amount (for shift instructions)
- funct: function (holds the actual functionality of the instruction for R-Type)
Examples:
re-arrange | 000000 | $s1 | $s2 | $s0 | 00000 | add |
decimal | 000000 | 17 | 18 | 16 | 00000 | 32 |
binary | 000000 | 10001 | 10010 | 10000 | 00000 | 100000 |
size | 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
0x02328020 |
000000 | 00000 | 18 | 16 | 2 | 000000 |
6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
Notes:
- add instruction’s function value (32) and (0) for sll is from the MIPS manual in which all instruction have a unique code
- the registers values are substituted from the registers table above
we can say that the instruction machine code is 0x02328020 in hex and this is how it’s stored in the executable file or in memory when loaded by the operating system for execution!
I-Type
op | rs | rt | immediate |
6 bits | 5 bits | 5 bits | 16 bits |
Examples:
8 | 17 | 16 | 5 |
6 bits | 5 bits | 5 bits | 16 bits |
0x22300005 |
43 | 9 | 17 | 4 |
6 bits | 5 bits | 5 bits | 16 bits |
0xAD310004 |
J-Type
a special type for J (Jump) and JAL (Jump and link) instructions.
NOTE: (Jump Register) JR instruction is an R-Type instruction with only rs operand assigned
op | address |
6 bits | 26 bits |
Example:
3 | 0x100028 |
6 bits | 26 bits |
0x0C100028 |
Note that the label address is represented in pseudo-direct addressing to make it possible to write a 32-bit address in only 26-bits which will be discussed right now!
Addressing modes
There are 5 different ways the CPU can access the memory in MIPS
Register-only
registers for all source and destination operands (R-Type uses it)
Immediate addressing
16-bit immediate with registers as operands (i.e. addi and lui)
Base-addressing
memory access instructions (i.e. lw and sw)
address of memory = base + sign-extended 16-bit offset of immediate
Example: lw $s0, 8($s1) address = $s1 (base-pointer) + 8
PC-relative
conditional branch instructions (i.e. beq, bne, …) use it to compute the new value of the PC (Program Counter)
Branch Target Address (BTA) = (PC + 4) + sign-extended offset of immediate
so if the offset is negative the label is above the current instruction.
Example:
1
2
3
4
5
6
0xA4 beq $t0, $0, else
0xA8 addi $v0, $0, 1
0xAC addi $sp, $sp, 8
0xBO jr $ra
0xB4 else: addi $a0, $a0, −1
0xB8 jal factorial
assuming PC=0xA4 the BTA will be: (0xA4 + 4) + 3 instructions
which means the target address is 3 instructions after 0xA8 instruction (if it’s a -5 then it would be 5 instructions before 0xA4)
Pseudo-direct
here the address is specified in the instruction which is used in J and JAL instructions (J-Type instructions) recall the example in J-Type the address had only 26-bits to be stored while in a program it should be 32-bit address for PC!
That’s the algorithm to calculate a 26-bit address from 32-bit address:
1- get the address of the label instruction Jump Target Address (JTA)
2- Discard the 2 least significant bits JTA1:0 Because the instructions are word-aligned 4 (0100)2, 8 (1000)2, 12 (1100)2 so the 2 LSB are always zeros!)
3- Discard the 4 most significant bits JTA31:28 Because they can be obtained from the PC address so if your program is not long it won’t be far from current instruction which also puts some constraints on the range)
Example:
1
2
3
0x0040005C jal sum
...
0x004000A0 sum: add $v0, $a0, $a1
The JTA for JAL instruction here is 0x004000A0 and here’s the conversion of it to 26-bit address by applying the above algorithm:
0 | 0 | 4 | 0 | 0 | 0 | A | 0 |
00 00 | 00 00 | 01 00 | 00 00 | 00 00 | 00 00 | 10 10 | 0000 |
0x0100028 |
Note that the address is combined into hex from right to left. You can think of the reverse process of adding 00 to the most right and adding the 4 most significant bits from the PC which are zero to get the original address 0x004000A0 back!
Variables and Arrays
Variables
To store a variable you can store it “immediately” (I-Type) to a registers if it’s 16 bits or less
1
addi $s0, $0, $0xF00D # $s0 = 0xF00D
or if it’s 32 bits:
1
2
lui $s0, 0x1337 # $s0 = 0x13370000
ori $s0, $s0, 0xF00D # $s0 |= 0xF00D = 0x1337F00D
or you can store it in memory in the data section and load it into a register:
1
2
3
4
5
6
7
8
9
.data
num: .word 0x1337F00D
.text
.globl main
main:
la $t0, num # $t0 = &num
lw $s0, ($t0) # $s0 = num = 0x1337F00D
note that .word in the data section is the size of the variable (word = 4 bytes = 32 bits) and can be:
- .space (empty)
- .byte (8 bits)
- .word (4 bytes)
- .asciiz (null terminated string)
- .ascii (string without null terminator)
- .align (aligns the next data on a 2n byte boundary)
Arrays
The array is stored in memory in the data section:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
.data
arr: .word 1, 2, 3
.text
.globl main
main:
la $s0, arr # $s0 = base address of arr
lw $t0, 0($s0) # $t0 = arr[0] = 1
lw $t1, 4($s0) # $t0 = arr[1] = 2
lw $t2, 8($s0) # $t0 = arr[2] = 3
exit:
li $v0, 10
syscall
note that the string is nothing but an array of characters in memory!
Multiplication and division
Since multiplication of two 32-bit registers can result in a 64-bit value, and the division of two numbers results in a quotient and a remainder there are 2 special purpose registers for mult and div instructions:
1
2
mult $s0, $s1 # result = hi:lo
div $s0, $s1 # hi = quotient, lo = remainder
If / else conditions
The main observation here is the low level condition is the inverse of the high level condition
High-level code:
1
2
3
4
if (x == y):
x = y << 2
else:
x = y
Low-level code
1
2
3
4
5
6
7
8
9
10
11
12
..
addi $s0, $0, 0 #x = 0
addi $s1, $0, 1 #y = 1
bne $s0, $s1, else #if (x != y) go to else
sll $s0, $s1, 2 #x = (x << 2)
j done
else:
add $s0, $s0, $s1 #x = y
done:
..
Note that this code contains a conditional jump bne
for comparison and an unconditional jump j
to prevent from executing the else statement since assembly goes naturally from top to bottom!
Loops
In programming loops consists of 4 main pieces:
- Initialization
- Condition
- Code that gets repeated
- Counter update
and so do assembly!
High-level code:
1
2
3
4
5
6
7
int pow = 1; // (initialization)
int x = 0;
while (pow != 128) { // (condition)
pow = pow * 2; // (code that gets repeated)
x = x + 1; // (counter update)
}
Low-level code
1
2
3
4
5
6
7
8
9
10
11
12
13
..
addi $s0, $0, 1 # pow = 1 (initialization)
addi $s1, $0, 0 # x = 0
addi $t0, $0, 128 # t0 = 128 for comparison
while:
beq $s0, $t0, done # if pow == 128, exit while loop (condition)
sll $s0, $s0, 1 # pow = pow * 2 (code that gets repeated)
addi $s1, $s1, 1 # x = x + 1 (counter update)
j while
done:
..
Functions
There’s no call/ret instructions in MIPS assembly like in Intel x86, but fortunately we have got labels! with this and Jump and Link JAL
and Jump Register JR
we can do magic!
Function call
Jump and Link JAL
copies the next instruction address (PC+4) to the $ra
register and jumps to the address of the label, when returning the Jump Register JR $ra
copies the value of $ra back to the PC so the program continues execution from where it was left.
This is an implementation for a basic function that adds two numbers and returns the result:
High-level code:
1
2
3
4
5
6
7
int add_func(x, y) {
return x + y;
}
int main() {
a = add_func(2, 4);
}
Low-level code:
1
2
3
4
5
6
7
8
9
10
11
.text
.globl main
add_func:
add $v0, $a0, $a1
jr $ra # return (PC = $ra)
main:
addi $a0, $0, 2
addi $a1, $0, 4
jal add_func # call function ($ra = PC + 4, PC --> add_func)
add $s0, $0, $v0
Notes:
- saved registers ($s0-$s7) shouldn’t change after call
- temporary registers ($0-$t9) can be changed inside the function
- arguments should be passed in ($a0-$a3) registers
- return values should be saved in ($v0-$v1) registers
Stack frames
each function should have its stack frame for purposes like saving the saved registers ($s0-$s7) at first and retrieving the values back at the end, and to control the stack we have 2 registers for this task:
$fp
: base of the stack$sp
: top of the stack
Note that $fp
> $sp
since the stack grows towards lower memory addresses
Example of allocating and de-allocating stack frame of a function:
1
2
3
4
5
6
7
8
9
addi $sp, $sp, -12 # allocation
sw $s0, 8($sp) # saves $s0
sw $t0, 4($sp) # saves $t0
sw $t1, 0($sp) # saves $t1
..
lw $t1, 0($sp) # restores $t1
lw $t0, 4($sp) # restores $t0
lw $s0, 8($sp) # restores $s0
addi $sp, $sp, 12 # de-allocation
System calls
To perform tasks like taking user input, printing on screen or exiting the program we use system calls as show in the table:
Service | System call code | Arguments | Result |
---|---|---|---|
print_int | 1 | $a0 = integer | |
print_float | 2 | $f12 = float | |
print_double | 3 | $f12 = double | |
print_string | 4 | $a0 = string | |
read_int | 5 | integer(in $v0) | |
read_float | 6 | float(in $f0) | |
read_double | 7 | double(in $f0) | |
read_string | 8 | $a0=buffer, $a1=length | |
sbrk | 9 | $a0=amount | |
exit | 10 |
the system call code gets loaded in $v0 followed by a syscall
instruction to execute it.
Example of exit system call to exit a program:
1
2
li $v0, 10
syscall
Another example of printing “foo” word on screen:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
.data
foo: .asciiz "foo"
.text
.globl main
main:
la $a0, foo
li $v0, 4
syscall
exit:
li $v0, 10
syscall
output: foo