compile：把 IR 翻译成汇编

本节阅读量:

上一节已经能读懂第一章生成的汇编：

1
2
3


movq / addq / retq
%rax / %rbp / %rsp
$40 / -8(%rbp)

这一节不再重新解释这些汇编写法，而是换一个问题：

1

compile 命令怎样从 IR 生成这些汇编？

也就是说，上一节是“看懂输出”，这一节是“看懂输出是怎么来的”。

先看输入 IR

还是用这个例子：

1

(+ (+ 1 2) (+ 3 4))

运行：

1
2
3


cd code/01_numbers
make
./mini ir examples/nested_add.lang

会看到：

1
2
3
4


t.0 = 1 + 2
t.1 = 3 + 4
t.2 = t.0 + t.1
return t.2

对汇编生成器来说，这份 IR 分成两部分：

1
2
3
4
5
6
7


ops:
  t.0 = 1 + 2
  t.1 = 3 + 4
  t.2 = t.0 + t.1

result:
  t.2

ops 是要依次生成的计算步骤。result 是所有步骤结束后，整个程序要返回的值。

这一节只跟住四条生成规则：

1
2
3
4


1. 给每个临时变量分配栈槽。
2. 根据栈槽数量留出栈空间。
3. 把每条 IR 操作翻译成汇编指令。
4. 把 result 放回 %rax。

给 temp 分配栈槽

上一节已经讲过：%rax 会被后续计算覆盖，所以第一章把每个 IR 临时变量保存到固定栈槽里。

这里重点看编译器怎么做这件事。代码在 code/01_numbers/src/compile/assembly.cpp：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


StackSlots assign_stack_slots(const IrProgram& program) {
    StackSlots slots;
    int offset = -8;

    for (const auto& op : program.ops) {
        slots[op.dst] = offset;
        offset -= 8;
    }

    return slots;
}

program.ops 里每条操作都有一个 dst，也就是这一步的结果临时变量：

1
2
3


t.0 = 1 + 2      dst 是 t.0
t.1 = 3 + 4      dst 是 t.1
t.2 = t.0 + t.1    dst 是 t.2

所以这段代码会得到：

1
2
3


t.0 -> -8
t.1 -> -16
t.2 -> -24

后面生成汇编时，stack_slot("t.0") 会把这个偏移量写成 AT&T 汇编里的地址：

1

-8(%rbp)

也就是说，汇编生成器内部先保存的是数字偏移量，真正输出文本时才拼成 -8(%rbp)。

留出栈空间

分配完栈槽以后，生成器要知道当前函数一共需要多少栈空间：

1
2


slots_ = assign_stack_slots(program);
stack_size_ = align_to_16(slots_.size() * 8);

这个例子里有三个栈槽，每个 8 字节：

1

3 * 8 = 24

但真实输出会留 32 字节：

1

subq $32, %rsp

因为 align_to_16 会把空间向上对齐到 16 的倍数。

上一节已经讲过 subq $32, %rsp 是在栈帧里留空间。这里要记住的是：这个 32 不是手写常量，而是从 IR 临时变量数量算出来的。

生成函数外壳

AssemblyEmitter::emit 会先输出入口标签，再输出栈帧开头：

1
2
3


out_ += ".globl " + main_label(target) + "\n";
out_ += main_label(target) + ":\n";
emit_prologue();

main_label(target) 负责处理平台差异：

1
2


linux -> main
macos -> _main

所以同一份 IR，在 Linux 和 macOS 上生成的入口标签会不一样。这个差异来自平台约定，不来自小语言本身。

emit_prologue 会使用刚才算出的 stack_size_ 输出栈帧开头。比如这个例子会输出 subq $32, %rsp。

如果没有任何栈槽，就不需要栈帧开头。比如单个整数 42 不会生成加法操作，也就没有临时变量栈槽。

翻译一条 add 操作

第一章目前只有一种 IR 操作：

1

dst = lhs + rhs

生成器对它使用固定模板：

1
2
3


movq lhs, %rax
addq rhs, %rax
movq %rax, dst

对应代码是：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


void emit_op(const Op& op) {
    switch (op.kind) {
    case OpKind::add:
        out_ += "    movq " + operand_text(op.lhs) + ", %rax\n";
        out_ += "    addq " + operand_text(op.rhs) + ", %rax\n";
        out_ += "    movq %rax, " + stack_slot(op.dst) + "\n";
        return;
    }

    throw std::runtime_error("unknown IR operation kind");
}

这里仍然使用 kind + switch。第一章只有 OpKind::add，后面章节增加新的 IR 操作时，也会继续在这个 switch 里明确处理。

把操作数写成文本

emit_op 里有两个辅助函数：

1
2


operand_text(...)  把 IR 操作数写成汇编操作数
stack_slot(...)    把临时变量名写成栈槽地址

IR 操作数有两种：

1
2


整数      例如 1
临时变量  例如 t.0

汇编文本分别是：

1
2


1   -> $1
t.0  -> -8(%rbp)

代码是：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


std::string operand_text(const Operand& operand) const {
    switch (operand.kind) {
    case OperandKind::integer:
        return "$" + std::to_string(operand.integer_value);
    case OperandKind::temp:
        return stack_slot(operand.temp_name);
    }

    throw std::runtime_error("unknown IR operand kind");
}

所以第一条 IR：

1

t.0 = 1 + 2

会生成：

1
2
3


movq $1, %rax
addq $2, %rax
movq %rax, -8(%rbp)

第三条 IR：

1

t.2 = t.0 + t.1

会生成：

1
2
3


movq -8(%rbp), %rax
addq -16(%rbp), %rax
movq %rax, -24(%rbp)

这就是 operand_text 和 stack_slot 合在一起做的事：把 IR 里的名字和数字，变成汇编文件里的文本。

生成所有 ops，再处理 result

emit 的主流程很短：

1
2
3
4
5
6
7


for (const auto& op : program.ops) {
    emit_op(op);
}

out_ += "    movq " + operand_text(program.result) + ", %rax\n";
emit_epilogue();
out_ += "    retq\n";

先按顺序生成所有 ops。然后把 program.result 放进 %rax。

在这个例子里：

1
2


program.result = t.2
t.2 -> -24(%rbp)

所以最后会生成：

1

movq -24(%rbp), %rax

这是返回前的最后一步：把小语言程序的最终结果放到返回值位置。

完整输出

生成汇编：

1

./mini compile examples/nested_add.lang -o out.s

核心输出类似这样。macOS 上入口会是 _main，Linux 上是 main：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


.globl _main
_main:
    pushq %rbp
    movq %rsp, %rbp
    subq $32, %rsp
    movq $1, %rax
    addq $2, %rax
    movq %rax, -8(%rbp)
    movq $3, %rax
    addq $4, %rax
    movq %rax, -16(%rbp)
    movq -8(%rbp), %rax
    addq -16(%rbp), %rax
    movq %rax, -24(%rbp)
    movq -24(%rbp), %rax
    movq %rbp, %rsp
    popq %rbp
    retq

把它和 IR 对起来看：

1
2
3
4


t.0 = 1 + 2      ->  emit_op 生成三条指令，结果存到 -8(%rbp)
t.1 = 3 + 4      ->  emit_op 生成三条指令，结果存到 -16(%rbp)
t.2 = t.0 + t.1    ->  emit_op 从两个栈槽读，结果存到 -24(%rbp)
return t.2       ->  emit 把 program.result 放进 %rax

这里故意使用统一规则：

1
2
3
4


每条 IR 都算到 %rax。
每条 IR 的结果都存到自己的栈槽。
需要某个临时变量时，就从它的栈槽读出来。
最后把 result 放回 %rax。

这不是最少指令，但它让第一章的 compile 路径很直：

1

IR -> stack slots -> assembly text

后面的寄存器分配章节会再讨论怎样减少这些多余的存取。

跑一下

生成汇编后，可以手动链接运行：

1
2
3
4


./mini compile examples/nested_add.lang -o out.s
cc out.s -o out
./out
echo $?

如果你在 Apple Silicon Mac 上，cc 默认可能按 arm64 汇编。因为本项目输出的是 x86-64 汇编，可以改用：

1

cc -arch x86_64 out.s -o out

退出码应该是：

1

10

第一章用退出码观察编译后程序的结果，只适合 0-255 之间的小整数。这里结果是 10，所以刚好能直接看。

1.7 汇编基础：寄存器、栈槽和返回值

1.9 本章总结

本节目录