Session Pruning

Session pruning 会在每次 LLM 调用前，针对内存上下文裁剪（trim）旧的 tool results。它不会重写磁盘上的 session 历史（*.jsonl）。

何时运行

当启用 mode: "cache-ttl" 且该 session 的上一次 Anthropic 调用早于 ttl。
只影响这次请求发送给模型的消息集合。
仅对 Anthropic API 调用（以及 OpenRouter 的 Anthropic models）生效。
为获得最佳效果，请让 ttl 与你的模型 cacheControlTtl 对齐。
prune 发生后，TTL 窗口会重置；后续请求会继续复用 cache，直到 ttl 再次过期。

智能默认值（Anthropic）

OAuth 或 setup-token profiles：启用 cache-ttl pruning，并把 heartbeat 设为 1h。
API key profiles：启用 cache-ttl pruning，把 heartbeat 设为 30m，并将 Anthropic models 的默认 cacheControlTtl 设为 1h。
如果你显式设置了这些值，Moltbot 不会覆盖。

这会改善什么（成本 + cache 行为）

为什么要 prune：Anthropic prompt caching 只在 TTL 内有效。如果 session 空闲超过 TTL，下一次请求会重新缓存完整 prompt；若先裁剪，重缓存的内容就更小。
哪些会更便宜：pruning 会降低 TTL 过期后的“首个请求”的 cacheWrite 体积。
为什么 TTL 重置重要：一旦 pruning 运行，cache 窗口会重置，因此后续 follow‑up 可以复用刚缓存的 prompt，而不是再次重缓存全量历史。
它不会做什么：pruning 不会增加 tokens 或造成“双倍”成本；它只改变 TTL 过期后首个请求中“被缓存的内容”。

可被裁剪的内容

仅 toolResult 消息。
User + assistant 消息永远不会被修改。
最近的 keepLastAssistants 条 assistant 消息会被保护；在该 cutoff 之后的 tool results 才可能被裁剪。
如果 assistant 消息数量不足以确定 cutoff，则会跳过 pruning。
包含 image blocks 的 tool results 会被跳过（永不 trim/clear）。

上下文窗口估算

Pruning 使用估算的上下文窗口（chars ≈ tokens × 4）。窗口大小按以下顺序解析：

模型定义中的 contextWindow（来自模型注册表）。
models.providers.*.models[].contextWindow 的覆盖值。
agents.defaults.contextTokens。
默认 200000 tokens。

模式

cache-ttl

只有当上一次 Anthropic 调用早于 ttl（默认 5m）时才会运行 pruning。
运行时：与之前一致，仍是 soft-trim + hard-clear 的组合策略。

Soft vs hard pruning

Soft-trim：只对“过大”的 tool results 生效。
- 保留 head + tail，中间插入 ...，并追加一段包含原始大小的说明。
- 会跳过包含 image blocks 的结果。
Hard-clear：将整个 tool result 替换为 hardClear.placeholder。

工具选择

tools.allow / tools.deny 支持 * 通配符。
deny 优先生效。
匹配不区分大小写。
allow 列表为空 => 允许所有 tools。

与其他限制的交互

内置 tools 已经会截断自己的输出；session pruning 是额外的一层，防止长对话在模型上下文中积累过多 tool 输出。
Compaction 是另一条机制：compaction 会总结并持久化；pruning 是每个请求临时生效。参见 /concepts/compaction。

默认值（启用时）

ttl："5m"
keepLastAssistants：3
softTrimRatio：0.3
hardClearRatio：0.5
minPrunableToolChars：50000
softTrim：{ maxChars: 4000, headChars: 1500, tailChars: 1500 }
hardClear：{ enabled: true, placeholder: "[Old tool result content cleared]" }

示例

默认（关闭）：

json5

{
  agent: {
    contextPruning: { mode: "off" }
  }
}

启用 TTL 感知的 pruning：

json5

{
  agent: {
    contextPruning: { mode: "cache-ttl", ttl: "5m" }
  }
}

将 pruning 限制到特定 tools：

json5

{
  agent: {
    contextPruning: {
      mode: "cache-ttl",
      tools: { allow: ["exec", "read"], deny: ["*image*"] }
    }
  }
}

配置参考参见：Gateway Configuration

Session Pruning ​

何时运行 ​

智能默认值（Anthropic） ​

这会改善什么（成本 + cache 行为） ​

可被裁剪的内容 ​

上下文窗口估算 ​

模式 ​

cache-ttl ​

Soft vs hard pruning ​

工具选择 ​

与其他限制的交互 ​

默认值（启用时） ​

示例 ​

Session Pruning

何时运行

智能默认值（Anthropic）

这会改善什么（成本 + cache 行为）

可被裁剪的内容

上下文窗口估算

模式

cache-ttl

Soft vs hard pruning

工具选择

与其他限制的交互

默认值（启用时）

示例