Tool runtime config: policy, hooks, and context budgets
I shipped a chunk of infrastructure in Zeus this week that I have wanted for a long time: a runtime config layer for tools.
If you have ever built an agent loop, you know the failure mode: you add tools, you add providers, you add evals, and pretty soon the tool layer is a hairball of one-off flags.
This diff is the opposite move. It is a single place to answer:
- Which tools are allowed for this run? (globally, and per provider)
- How big are tool results allowed to be? (and how do we cap them)
- Can I inject small bits of behavior around model selection, prompt building, or tool execution?
This post is a diff tour of what changed and why.
The problem: tool sprawl without a runtime contract
In early versions of Zeus, tools were mostly static. The loop handed the model a fixed list of tool schemas and hoped for the best.
That breaks down once you need any of these:
- a “minimal” profile for constrained runs (readonly-ish debugging)
- a provider-specific tool subset (some models are better with fewer tools)
- a way to turn on hooks for experiments without rebuilding the binary
- predictable budgets for tool outputs so the prompt does not explode
The diff adds three files that together act like a runtime contract for tools:
llm/tool_runtime_config.pyllm/tool_policy_pipeline.pyllm/tool_hook_pipeline.py
And then it wires them into the loop and tool dispatcher.
1) Tool runtime config: defaults + normalization
The core is tool_runtime_config.py. It defines a default config and a normalizer that is deliberately strict. If the config is missing or malformed, it snaps back to defaults.
A few things I like about this design:
A. There is a default per-tool envelope
A tool like shell has a natural set of runtime tunables:
"shell": {
"max_result_chars": 40_000,
"timeout_sec": 20,
"security": "full",
"ask": "off",
"yield_ms": 10_000,
"safe_bins": [],
}
Same for web_fetch (timeouts, redirect limits, cache TTL, readability) and for read and apply_patch (input limits).
The important bit is not the exact numbers. It is that every tool now has a place where its runtime knobs live.
B. Normalization is copy-on-write, with clamping
The normalizer does a deep copy of defaults, then merges user-supplied overrides, but clamps obvious footguns:
- integer fields have min/max bounds
- string lists are de-duplicated
- hooks are normalized into dict entries
This sounds boring, but it is the difference between a “config file” and a config system you can actually ship.
2) Tool policy pipeline: filtering the tool set
tool_policy_pipeline.py implements a small but crucial feature: take a list of tool schemas, and filter it down according to a policy block.
There are two layers of policy:
- Global policy
- Provider override (eg “grok gets minimal tools, claude gets full tools”)
The code resolves an available tool name set from the schemas, applies:
profile(egminimal)allow(hard set)also_allow(union)deny(subtract)
Then returns a filtered schema list plus some debug metadata.
The “minimal” profile is explicit
Right now the predefined toolsets are intentionally small:
"minimal": {
"read",
"file_search",
"web_fetch",
"web_search",
"memory",
"knowledge",
"ask_user",
}
This is not security. It is operational: sometimes you want a run that cannot mutate anything. (And it is a clean way to reproduce failures in evals.)
3) Hook pipeline: experiment without forking the loop
The other half of “runtime control” is hooks.
tool_hook_pipeline.py adds four phases:
before_model_resolve(provider or model selection)before_prompt_build(system prompt patching)before_tool_call(block tool, merge input, inject context)after_tool_call(append a note, set a field, truncate)
Hooks are intentionally non-fatal. If a hook throws, Zeus records a warning and keeps going.
A couple of patterns that already feel useful:
A. Provider forcing
You can force a provider/model from config without touching code:
set_providerset_modelset_provider_and_model
This is the kind of thing you want for eval harnesses and A/B testing.
B. Tool blocking at runtime
In before_tool_call, you can block a specific tool (or all tools) with a message:
block_tool
This gives you a way to temporarily “pull the plug” on a tool in production while keeping the rest of the agent usable.
C. Light input patching
merge_input is a small hammer that goes a long way. You can inject a timeout, change a parameter, or nudge a tool into a safer mode.
4) Tool result budgeting: context share ratio
The agent loop also got a concrete improvement: tool result sizes are now capped by a mix of:
- per-tool
max_result_chars - and a global
result_context_share_ratio
The logic (in loop.py) is basically:
- take the tool’s hard cap
- compute a share of the model’s context window
- clamp the share between a minimum and the hard cap
This is the difference between “the agent sometimes goes off the rails when rg returns 200k chars” and “tool outputs are predictable”.
I defaulted the ratio to ~0.22 and the minimum to 1200 chars. The specific numbers are easy to tune once you have the mechanism.
5) Why this matters
This diff is not flashy. But it unlocks a bunch of engineering moves:
- ship new tools without adding ad-hoc config flags
- run evals under strict tool profiles
- do provider-specific toolsets without branching schemas
- patch system prompts in a controlled way
- keep the loop stable under large outputs
It is also a step toward a clean separation:
- the loop is the loop
- the tools are the tools
- the runtime config defines the envelope they run inside
That separation is what makes an agent system maintainable.
Appendix: the three new files
apps/client-mac/zeus/Resources/server/llm/tool_runtime_config.pyapps/client-mac/zeus/Resources/server/llm/tool_policy_pipeline.pyapps/client-mac/zeus/Resources/server/llm/tool_hook_pipeline.py
If you want to play with this yourself, the implementation is small enough to read end-to-end.