Transformer 推理图

llama.cpp 通过 GGML 计算图构建 Transformer 的 forward pass，支持 50+ 种模型架构。

涵盖内容

章节	核心主题
概念	Transformer 层结构、RoPE、注意力机制
代码走读	llama-model.cpp 的 forward pass
练习	计算图构建、RoPE 位置编码

核心概念

计算图构建 — 每次推理动态构建 GGML 计算图
rope — Rotary Position Embedding，旋转位置编码
注意力 — Scaled Dot-Product Attention + KV Cache
FFN — 前馈网络（SwiGLU / GeLU）

前置知识

学习路径

读完本主题后，你将理解：

llama.cpp 如何用 GGML 操作构建完整的 Transformer forward pass
RoPE 位置编码的计算方式
注意力计算中的优化技巧
不同模型架构在代码中的差异

→ 下一步：KV Cache 与批处理