CALL(Compiling, Assembling, Linking and Loading)

p.s.: you can see [here](计组学习08——CALL - ZzTzZ - 博客园 (cnblogs.com)) for Chinese version for better understanding.
![[Pasted image 20240624150906.png]]
For C, compiling is actually multiple steps:
compiling .c files to .o files,
automatic assembling,
then linking the .o files into executables.

Step1: compiler

input: high-level language code (foo.c)
output: assembly (foo.s)
pseudo-intsturctions (伪指令) might be included

Step2: assembler

input: assembly
output: machine language module, object file (foo.o)
Don’t know where the libs are located!!!
objected file format:

Object File Header
Text Segment
Data Segment
Symbol Table: List of file’s labels, static data that can be referenced by other programs
Relocation Information: Lines of code to fix later (by Linker)
Debugging Information
汇编器都会：
读取并且使用指令
把伪指令替换掉
产生机器语言

Assembler Directives

他们向汇编语言提供方向但是不产生机器指令
.text:在后面加上一些用户文本部分或者代码部分中的项目
.data：类似于text，除此之外把内容放在静态字段或者内存的数据段
.globl sym：声明 sym这个全局变量，并且可以被其他文件使用
.asciiz str: 把字符串str存储在内存里，也是以null终止
.word w1 w2 wn：接受由空格分割的整数或者单词，然后存储着n个32位量在连续的内存里
替换伪指令：在之前已经举过例子了，会把一些伪指令替换成真正的指令

产生机器语言

Simple case:

Arithmetic, Logical, Shifts, etc.
All necessary info is within the instruction already!
PC-Relative Branches and Jumps:
e.g., beq/bne/etc. and jal
Position-Independent Code (PIC):
- Once pseudoinstructions are replaced with real ones, all known PC-relative addressing can be computed
简单的情况
- 算术或者逻辑指令，位运算都算比较简单的，交给计算机就好
- 因为所有的信息都在指令里
分支和跳跃呢？
- 分支和跳跃需要相对地址
- 当我们把伪指令转化为真正指令的时候，我们会计算每条指令，并且弄清他们的去向这意味着我们会知道地址
如果跳跃到一些标签呢？
- 分支控制语句可能跳转到一些以后的标签，我们还没有计算过地址的标签，甚至于汇编器都不知道这个标签的意义是什么，该怎么办呢？
  - 解决方案：让汇编器扫描两次，这样就都知道了
  - Pass 1: Remember positions of labels (store in symbol table).
  - Pass 2: Use label positions to generate machine code

创建的为linker/debugger的东西

symbol table: list of ‘items’ in this object file

这些“东西”是指：
- 标签：函数调用
- 数据：所有在.data部分
  变量可被文件访问
保持对标签的追踪解决了“前向关联问题”

重定位表（relocation table）

重定位表是一个用来存储文件里一会可能要用到地址的“东西”
这些“东西”是指：
- 任何拓展标签（jal，jar）
  - 内部标签 internal
  - 拓展标签 external (包括库文件)
- 认可数据段
  - 比如任何在data部分提到的

step3 linker

input: object files(foo.o)
output: excutable machine code (a.out)

输入：目标文件（object file）
输出：可执行代码(Executable Code)
把多个目标文件组合成为一个可执行文件**(“linking”)**
可以进行单独的编译文件（非常有用！）
- 如果改变单个文件不需要重新编译整个项目
  ![[Pasted image 20240625105533.png]]
  比如object file 1是我们的程序文件，object file 2是我们的库文件链接器可以把我们两个部分链接到一起

链接器将从每个.o文件中提取文本段然后把他们放在一起
把每个.o文件的数据段放在一起，之后把数据连接到文本段之后
解决所有引用
- 看重定位表并且处理其中每个条目
- 填写所有最终的绝对地址
  三种地址类型

相对PC的地址(beq, bne, jal)
- 不需要重定位，只要代码之间的相对位置不变，就不需要改变
外部函数的引用(往往都是jal)
- 往往需要重定位
静态数据的引用(往往都是auipc和addi)
- 往往需要重定位
- RISCV经常用auipc
  RISC-V中的绝对寻址
  哪些指令需要重新定位编辑？
J格式的指令： jump / jump and link
加载或者储存变量到静态区域里，和全局指针相关
条件分支，也是PC相关的，但是不用担心搬移，因为代码之间的相对位置没有改变

step4 loader (one of the tasks for OS)

Load program into a newly created address space:

Read executable’s file header for sizes of text, data segments
Create new address space for program large enough to hold text and data segments, along with a stack segment.
- Copy instructions, data from executable file into new address space.
- Copy arguments passed to the program onto the stack.

Initialize machine registers

Most registers cleared; stack pointer (sp) assigned address of first free stack location

Jump to start-up routine

copy program arguments from stack to the registers, set PC
if main routine returns, terminate programs with exit system call.
a.out must contain both the machine code itself (text segment) and any static (data segment).

interpretion vs translation

interpreter: Directly executes a program in the source language (slower for higher-level language)
Translator: Converts a program from the source language to an equivalent program in another language (thus higher in efficiency)