7.8: How to Calculate Branch Amounts in Machine Code

Charles W. Kann III
Gettysburg College

This chapter has shown how to use the branch statements to implement structure programming logic. However how a branch statement manipulates the $pc register to control the execution has yet to be discussed. This section will cover the details of how the branch statement is implemented in machine code.

7.8.1 Instruction Addresses

When the memory for the MIPS computer was shown in section 3.2, a segment labeled program text (or simply text) was shown as starting at address 0x00400000. This section of the memory contains the machine code translation of the instructions from the .text segment of your program. Thus the text segment of memory is where all the machine code instructions for the program are stored. When a program is assembled, the first instruction is inserted at address 0x0040000. The instructions for the program each take 4 bytes, so the assembler keeps an internal counter, and for each instruction it adds 4 to that counter and uses that number for the address of the next instruction. The new instruction is then placed at that address in the memory, and the process is continued to allow each subsequent assembly instruction inserted at the next available word boundary. Thus the first instruction in a program is at address 0x00400000, the second instruction is at address 0x00400004, etc. Note that machine instructions must always start on a word boundary. A simple example of an assembled program is the following.

addi $t0, $zero, 10 addi $t1, $zero, 15 add $t0, $t0, $t1

This program is shown below in a MARS screen image. The Address column of the grid shows the address of the instruction. In this example, the first instruction is stored at 0x00400000, the second at 0x00400004, and the third at 0x00400008. Screen Shot 2020-07-01 at 6.47.22 PM.png So if all of the instructions are real instructions, placing the instructions at the correct address is as simple as adding 4 to each previous instruction. The problem is pseudo operators, as one pseudo instruction can map to more than one real instruction. For example, a la pseudo instruction always 2 takes instructions, and thus takes up 8 bytes. Some instructions, such as immediate instructions which can have either 16 bit or 32 bit arguments, can be different lengths depending on the arguments. Thus it is important to be able to translate pseudo operators into real instructions. This is shown in the following example program. Note how the Source column is translated in real instructions in the Basic column. The real instructions are the ones that are numbered in the Source column. Screen Shot 2020-07-01 at 6.48.40 PM.png It is important to be able to number the instructions correctly to calculate branch offsets.

7.8.2 Value in the $pc register

Branches in MIPS assembly work by adding or subtracting the value of the immediate part of the instruction from the $pc register. So if a branch is taken, $pc is updated by the immediate value, and control of the program continues at the labeled instruction.. Earlier in Chapter 3 when it was explained how the $pc register is used to control the flow of the program, it was apparent that at the start of each instruction the $pc points to the instruction to execute. So a reader could think that the value to be incremented by the immediate part of the branch is the address of the current instruction. However when an instruction executes, the first thing that is done is the $pc is incremented by 4 to point to the next instruction. This makes sense since in the majority of instances the program is processed sequentially. However this means that when a branch is executed the amount which must be added or subtracted will be from the next sequential instruction, not the current instruction. The following example shows how this works. In the first branch instruction, the branch is to label2. The distance between this instruction and the label consists of 3 real instructions, which is 3 words or 12 bytes, from the current instruction. However since the $pc was already incremented to point to the next instruction, the branch will be incremented by 8 bytes, not 12. Screen Shot 2020-07-01 at 7.00.25 PM.png The second branch instruction branches backward to label1. In this case, the distance between the instruction and the label is -2 instructions, which is 2 words or 8 bytes, back from the current instruction. However because the $pc is incremented to point to the next instruction, -3 words, or -12 bytes, must be subtracted from the $pc in the branch instruction. Screen Shot 2020-07-01 at 9.17.36 PM.png The following MARS screen shot shows that this is indeed the branch offsets for each of the branch instructions. Screen Shot 2020-07-01 at 9.24.39 PM.png

7.8.3 How the word boundary effects branching

Remember that the I format instruction uses a 16 bit immediate value. If this was the end of the story, then branches could be up to 64K bytes from the current $pc . In terms of instructions, this means that a branch can access instructions that are -8191..8192 real instructions from the current instruction. This may be sufficient for most cases, but there is a way to allow the size of the branch offset to be increased to 2 18 bits. Remember that all instructions must fall on a word boundary, so the address will always be divisible by 4. This means that the lowest 2 bits in every address must always be “00”. Since we know the lowest two bits must always be “00”, there is no reason to keep them, and they are dropped. Thus the branch forward in the previous instruction is 2 (10002 >> 2 = 00102 , or more simply 8/4 = 2). The branch backward is likewise - 3 (1101002 >> 2 = 11101₂ , or more simply -12/4 = -3). Be careful to remember that the branch offsets are calculated in bytes, and that the two lowest order 00 bits have been truncated and must be reinserted when the branch address is calculated. The reason this caution is given is that the size of the offset in the branch instruction is the number of real instructions the current $pc needs to be incremented/decremented. This is just a happy coincidence. It makes calculating the offsets easier, as all that needs to be done is count the number of real instructions between the $pc and the label, but that in no way reflects the true meaning of the offset.

7.8.4 Translating branch instructions to machine code

Start with the program as written by the programmer. Note that there are 3 branch statements. Only these 3 branch statements will be translated to machine code. In this case the entire program, including comments, is included so that the reader understands the program. However comments are not kept when a translation to machine code is made, so the subsequent presentations of these programs will drop the comments.

# Filename: PrintEven.asm # Author: Charles Kann # Date: 12/29/2013 # Purpose: Print even numbers from 1 to 10 # Modification Log: 12/29/2013 - Initial release # # Pseudo Code # global main() # < # // The following variable can be kept in a save register. # register int i # # // Counter loop from 1 to 10 # for(i=1;i<11;i++) # < # if ((i %2) == 0) # < # print("Even number: " + i) # ># > # > .text .globl main main: # Register Conventions: # $s0-i addi $s0, $zero, 1 BeginForLoop: addi $t0, $zero, 11 slt $t0, $s0, $t0 beqz $t0, EndForLoop addi $t0, $zero, 2 div $s0,$t0 mfhi $t0 seq $t0, $t0, 0 beqz $t0, Odd la $a0, result move $a1, $s0 jal PrintInt jal NewLine Odd: addi $s0, $s0, 1 b BeginForLoop EndForLoop: jal Exit .data result: .asciiz "Even number: " .include "utils.asm"

addi $16, $0, 0x00000001

addi $8, $0, 0x0000000b

slt $8, $16, $8

beq $8, $0, . (label EndForLoop)