Assembly language

definition

Different CPUs implement different sets of instructions. The set of instructions a particular CPU implements is an Instruction Set Architecture (ISA), and the programming language defined by the ISA is commonly known as an assembly language.
- Examples: ARM (cell phones), Intel x86 (i9, i7, i5, i3), IBM/Motorola PowerPC (old Macs), MIPS, RISC-V, …

risc-v

CS 61C Reference Card

https://cs61c.org/sp24/pdfs/resources/reference-card.pdf
Lists out the entire base architecture
Venus
https://venus.cs61c.org/
Online RISC-V simulator

overall view

system:

CPU: compute things fast; but store little data
main memory: store more data, but access much slower than computed.

register

A register is a CPU component specifically designed to store a small amount of data. Each register stores 32 bits of data (for a 32-bit system) or 64 bits of data (for a 64-bit system). For the purposes of this class, we consider RV32 only (which uses 32-bit registers)
This data is purely binary; types do not exist at the assembly level
Registers are a hardware component, so once you make the CPU, you can’t change the number of registers available.
- Can’t “make” a new register when defining a new variable; have to delete an existing register for each variable you make.
RISC-V gives access to 32 integer registers:
- x0-x31
- x0 always stores 0, so only 31 registers are available to hold varibles

instructions

![[Pasted image 20240627160611.png]]
Each line of RISC-V code is a single instruction, which executes a simple operation on registers
in the format:

<instruction name> <destination register> <operands>
Ex. “add x5 x6 x7” means “Add the values stored in x6 and x7, and store the result in x5”
Commas can be added between registers (“add x5, x6, x7”), but this is optional.
for comments: # ...
Far more important than in other languages!!!

addition

add x1, x2, x3 == a = b + c (x1 ->a, x2 -> b, x3 -> c)

subtraction

sub x3,x4,x5 == d = e - f
exercise:

do a = b + c + d - e;
do f = (g + h) - (i + j);

Immediates

numerical constants
add immediate: addi x3, x4, 10 == f = g + 10
No subtract immediates in RISC-V!

load immediate

li rd, imm
e.g.: li rd, 5 == addi rd, x0, 5

register 0

add x3, x4, x0 == f = g

data transfer

main memory

![[Pasted image 20240529085310.png]]

4 bytes together make a word
RISC-V uses little-endian to store data: The least significant byte gets stored at the lowest address.

load word

lw rd imm(rs1): Compute imm+rs1, then load the 4 bytes at that address into rd
example: lw x10 12(x5)

0x100+12=0x10C
bytes at 0x10C-0x10F: 0x53, 0x42, 0x56, 0x00
32-bit valurte 0x0056 4253

store word

sw rs2 imm(rs1): Compute imm+rs1, then store the 4 bytes of rs2 into that address.
e.g.: sw x10 0(x5) if x5 is 0x100, x10 is 0x1234 5678

0x100+0 = 0x100
Since RISC-V is little-endian, the 32-bit value 0x1234 5678 gets split into bytes 0x78 0x56 0x34 0x12
So the bytes in memory 0x100, 0x101, 0x102, and 0x103 get set to 0x78, 0x56, x34, and 0x12, respectively.

load byte

lb rd imm(rs1)

store byte

sb rs2 imm(rs1)

example: converting C code to RISC-V

int a = 5;  
char b[] = "string";  
int c[10];  
uint8_t d = b[3];  
c[4] = a+d;  
c[a] = 20;

RISC-V:

li t0 5
sw t0 0(sp)
li t0 0x69727473
sw t0 4(sp)
li t0 0x0000676E
sw t0 8(sp)
lb t0 7(sp)
sb t0 52(sp)
lw t0 0(sp)
lbu t1 52(sp)
add t2 t0 t1
sw t2 28(sp)
li t0 20
lw t1 0(sp)
slli t1 t1 2 #t1*=4
addi t1 t1 12
add t1 t1 sp
sw t0 0(t1)

control flow

new C operator

label

A label is an identifier to a particular line of code

Doesn’t count as a line of code itself; merely “points out” a particular line
Each label must have a unique name (like variable names)

goto

The goto statement changes the next line to be run to the labelled line

The label can be either before or after the goto statement.
never use goto!!!
goto has a tendency to create completely illegible code

Generally considered bad practice, except in very specific situations

Error handling

Jumping out of nested loops
simple example of goto:

int* a = malloc(sizeof(int)*1000);
if(a == NULL) goto ErrorA;
int* b = malloc(sizeof(int)*1000000);
if(b == NULL) goto ErrorB;
int* c = malloc(sizeof(int)*1000000000);
if(c == NULL) goto ErrorC;
FILE* d = fopen(filename);
if(d == NULL) {
           free(c);
ErrorC:    free(b);
ErrorB:    free(a);
ErrorA:    allocation_failed();
}

RISC-V instructions for control flow

List of branch instructions:

beq rs1 rs2 Label: Branch if EQual
bne rs1 rs2 Label: Branch if Not Equal
blt rs1 rs2 Label: Branch if Less Than (signed) (rs1 < rs2)
bge rs1 rs2 Label: Branch if Greater or Equal (signed)
bltu rs1 rs2 Label: Branch if Less Than (unsigned)

bgeu rs1 rs2 Label: Branch if Greater or Equal (unsigned)
example:

int a = 0;
for(int i = 0; i < 10; i++) {
    if(i == 7) {
        break;
    }
    a = a + i;
}
a = a + 50;

	  li x10 0        #int a = 0;
      li x5 0         #int i = 0;
Loop:
      li x6 10        #int j = 10;
      bge x5 x6 End   #if(i >= j) goto End;
      li x6 7         #j = 7;
      beq x5 x6 End   #if(i == j) goto End;
      add x10 x10 x5  #a = a + i;
      addi x5 x5 1    #i = i + 1;
      j Loop          #goto Loop;
End:  addi x10 x10 50 #a = a + 50;

bitwise operations

review: C bitwise operations:

AND, OR, XOR, and NOT: perform the operation one bit at a time
left and right shift: shift the bits of the number left/right
shift have numerial equivalents in math
left shift: multiply by $2^n$ –> fill the new bits with 0s
right shift by n -> floor divide by $2^n$ –> dividing an unsigned number should fill the bits with 0s; signed numbers fill with the sign bit

RISC-V operations

add rd rs1 rs2: Add
sub rd rs1 rs2: Subtract
and rd rs1 rs2: Bitwise AND
or rd rs1 rs2: Bitwise OR
xor rd rs1 rs2: Bitwise XOR
sll rd rs1 rs2: Shift Left Logical (rd left rs2 digit, left using the highest of rs1)
srl rd rs1 rs2: Shift Right Logical
sra rd rs1 rs2: Shift Right Arithmetic
so far: ![[Pasted image 20240617151458.png]]

precedures

C functions:

int foo(int i) {
    if(i == 0) return 0;
    int a = i + foo(i-1);
    return a;
}
int j = foo(3);
int k = foo(100);
int m = j+k;

Calling a function:

Set function arguments
Goto the start of the function
During a function call:
Keep local scope separate from global scope
Perform the desired task of the function
Returning from a function:
Place the return value in a variable that can be accessed
Goto the line immediately after the function call

problems in RISC-V

maintaining scope

In RISC-v, however, local scope do not exist. Therefore, we need a way to store variables that no called function can change

returning from a function

We’ll need a way to send in the return address to a function, and jump to that return address when we finish with the function

memory model of RISC-V

almost the same as C
Program is also a kind of data. It stored in text section of the memory.
in RISC-v:

every (real) instruction is stored as a 32-bit number
A special 33rd register called the Program Counter (or PC) keeps track of which line of code is currently being run
jumps: jump to the address of the instruction
jal rd label: jump and link: jump to the given label and sets rd to PC+4 (for function calls)
j label: jump to the given label; used for unconditional jumps
jalr rd rs1 imm: Jumps to the instruction at address rs1+imm, and sets rd to PC+4 (used for higher-order functions and some function calls)
jr rs1: jump to the instruction at address rs1 (return from a function)

stack

One of our registers (by convention x2, nicknamed sp, or “stack pointer”) is set to the bottom of the stack. A function can choose to create a stack frame, by manipulating sp.

Anything above the sp at the start of a function belongs to another function. You may not modify anything above the sp without permission.
Everything below the sp is safe to modify.
- But anyone else can modify it, so you can’t leave data there and expect it to stay the same
By decrementing the sp, we can allocate as much space as we need for our function, that we can use however we want.
After finishing a function call, the sp must be set to its value from before the function call
自己的理解：相当于寄存器sp存储了一个栈的底部，然后当调用函数时将这个栈的底部向下移一定的位数形成自己的栈（addo sp sp -8），调用玩了之后再把sp还原回原来的位置

converting a C function into RISC-V

the function is:

int foo(int i) {
    if(i == 0) return 0;
    int a = i + foo(i-1);
    return a;
}
int j = foo(3);
int k = foo(100);
int m = j+k;

step1: define how the function plans to use registers
Inputs:
i: x10

We’ll call this register “a0” for “argument”
Return Address: x1
We’ll call this register “ra”
Output: x10 (Yes, we’ll reuse a0 for the return value)
Stack Pointer: x2
Nicknamed “sp”
Register that will NOT be changed by foo: x8, x9
We can still use these registers, as long as they get restored by the end of the function

We’ll call these registers “s0” and “s1” for “saved”

li a0 3      # int j = foo(3);
jal ra foo   # call foo
mv s0 a0     # mv rd rs1 sets rd = rs1
li a0 100    # int k = foo(100);
jal ra foo   # call foo
mv s1 a0     # Saves return value in s1
add a0 s0 s1 # int m = j+k;

the function foo:

foo: # int foo(int i)
addi sp sp -8 #Prologue
sw ra 0(sp)   #Prologue
sw s0 4(sp)   #Prologue
mv s0 a0      #Move i
bne s0 x0 next
li a0 0
j Epilogue    #     if(i == 0) return 0;
Next:
addi t0 s0 -1 #int j = i - 1;
mv a0 t0
jal ra foo    #j = foo(j);
mv t0 a0
add a0 s0 t0  #int a = i + j;
Epilogue:
lw ra 0(sp)   #Epilogue
lw s0 4(sp)   #Epilogue
addi sp sp 8  #Epilogue
jr ra         #return a;

calling convention

Standardize a set of conventions that everyone agrees to follow in the functions of each register
![[Pasted image 20240623140053.png]]

zero: The x0 register, which always stores 0
ra: x1, which is used to store return addresses
- Two new pseudoinstructions that explicitly use this:
  - jal Label -> jal ra Label
  - ret -> jr ra
sp: The x2 register, which is the stack pointer
s0-s11: Saved registers
- Registers that do not need to be restored by a called function (i.e. if you want to save a variable in this register, it needs to be saved somewhere before you call another function)
a0-a7: Registers used for function arguments
- a0, a1 also used for function outputs
If a function needs more than 8 arguments, can use the stack to store more arguments
t0-t6: temporary Registers
gp: The x3 register, used to store a reference to the heap. Also called the “global pointer”
tp: The x4 register, used to store separate stacks for threads

all remaining RISC-V instructions

slt, slti, sltu, sltiu: Set Less Than
- slt rd rs1 rs2: Compares rs1 to rs2. If rs1 < rs2 (signed), sets rd to 1. Otherwise sets rd to 0.
lh, lhu, sh
- Load/Store Halfword: Same as lb/lbu/sb, but for 2-byte blocks
ebreak, ecall: Environment Break/Call
- Asks the computer to do something (ex. Print data, set a breakpoint for debugging, allocate heap space)
- We’ll provide utility functions that call ecall/ebreak for you
lui, auipc: will be mentioned in the following part!

RISC-V Instruction Formats

简单的概括：程序也是数据，研究存储程序的数据
![[Pasted image 20240624121656.png]]

R-types

![[Pasted image 20240624121306.png]]

Designed for instructions with 3 registers and no immediate
- Arithmetic operators like add or sub
Each register is identified by its number. 32 registers → 5 bits to identify one register uniquely
- 0x0 → 0b00000
- a0 → x10 → 0b01010
opcode: Instruction identifier: Always the last 7 bits of the instruction over all instruction formats
- Can therefore be used to determine which instruction format is currently in use.
Some sets of similar instructions get assigned the same opcode
- Ex. All arithmetic R-type instructions have the opcode 0x33
funct3: 3-bit identifier to differentiate instructions with the same opcode
funct7: Extra 7-bit identifier for extremely similar instructions with the same opcode and funct3 (such as sra and srl)
![[Pasted image 20240624121449.png]]

I-type

![[Pasted image 20240624121318.png]]
Designed for instructions with 2 registers (rs1 and rd) and 1 immediate

Most components are stored the same way as before, with the addition of the imm component
Most instructions use signed immediates, so our range for I-type immediates is [-2048,2047].
![[Pasted image 20240624121541.png]]
![[Pasted image 20240624121526.png]]
![[Pasted image 20240624121549.png]]

S-type

![[Pasted image 20240624121639.png]]
similar to I-type, but store the immediate separately
![[Pasted image 20240624121726.png]]

U-type

![[Pasted image 20240624143102.png]]

Load Upper Immediate: lui rd imm
- Sets rd to imm << 12
- example: How would you translate “li t0 0xABCDEFFF” to instructions?
- lui t0 0xABCDF #t0 stores 0xABCDF000
  addi t0 t0 0xFFF #t0 stores 0xABCDEFFF
Upper Immediate to Program Counter: auipc rd imm
- Sets rd to (imm << 12) + PC
Primarily used in two pseudoinstructions:
- li rd imm: Set rd to imm
- la rd Label: Set rd to the address of Label
  ![[Pasted image 20240624143227.png]]

B-type

![[Pasted image 20240624143201.png]]

Branch instructions also use 2 source registers and an immediate, so the format is similar to S-Type
- This format is sometimes referred to as SB-type for that reason
Note that the immediate is stored in a strange pattern
- If we had the binary 0bA BCDE FGHI JKLM (where each letter was a bit), the first box would store 0bACD EFGH and the second box would store 0bI JKLB. Bit M isn’t stored.
- This is also to simplify the underlying circuit; note that we put the MSB of our immediate in the MSB of our instruction (to simplify sign-extension), and other than that put 10 of the remaining 11 bits in the same position as S-type instructions
Branch instructions have 13-bit immediates = [-4096, 4094] range, which is up to 210 instructions up/down.
![[Pasted image 20240624143551.png]]

J-types

![[Pasted image 20240624143745.png]]

Jal instructions use only 1 destination and an immediate, so we can use the U-type format for extra immediate bits
- This format is sometimes referred to as UJ-type for that reason
Note that the immediate is stored in an even stranger pattern
- If we had the binary 0bA BCDE FGHI JKLM NOPQ RSTU (where each letter was a bit), the data would be stored as 0b AKLM NOPQ RSTJ BCDE FGHI. As before, the last bit isn’t stored
- Note that we put the MSB of our immediate in the MSB of our instruction, bits 19-12 in the same spot as U-types, and bits 10-1 in the same spot as I-types.
Jumps have 21-bit immediates, so up to 218 instructions up/down

summary

you can get the summary card here.