Affected model
any vision-enabled models
What's wrong?
Something else
What it says now, and what it should be
Manifest doesn't yet have a way to auto-route requests containing images to vision-capable models, and the /v1/models endpoint doesn't advertise which models support image input. This breaks downstream AI frameworks (Vercel AI SDK, KiloCode, etc.) that check model capabilities before allowing image data through.
What happens: A user configures Manifest with Kimi 2.5 (or GPT-4o, Claude) behind a custom routing header like x-manifest-tier: StaticImage. The downstream framework sees the /v1/models response — which only returns { id, owned_by } — and hardcodes input: ["text"]. When a request contains image content, the framework strips it before the API call ever reaches Manifest. The image_url never gets there.
What Already Works
The /v1/chat/completions path already passes image_url content through unmodified — no changes needed there. And the recent commit 6f52fdd fixed image_url normalization on the Responses API path too. Manifest's proxy layer is fine; the bottleneck is upstream capability detection.
Feature Requests
- Native vision routing category
Discussion #1563 asked about setting up routing for vision tasks. Currently Manifest has image_generation but no vision or image_understanding category. Adding one would let Manifest auto-route image-containing requests to models that support them:
routes:
- path: /v1/chat/completions
categories:
vision: ["kimi-k2", "gpt-4o", "claude-sonnet"]
- /v1/models endpoint: include modality metadata
Return capability information so downstream frameworks can auto-detect vision support without manual config:
{
"data": [
{
"id": "kimi-k2",
"owned_by": "moonshot",
"capabilities": {
"input": ["text", "image"],
"output": ["text"],
"vision": true
}
}
]
}
This matches how some providers (OpenAI, Anthropic) expose model capabilities and would let auto-discovered models get the right modality defaults without users having to hand-configure each one.
Why This Matters
Anyone using Manifest as an OpenAI-compatible provider for AI SDK workflows hits this wall — the model receives "ERROR: Cannot read image (this model does not support image input)" instead of the actual image, even though the upstream model is fully vision-capable. The fix isn't in Manifest's proxy code; it's in giving downstream tooling the information it needs to trust the model.
Affected model
any vision-enabled models
What's wrong?
Something else
What it says now, and what it should be
Manifest doesn't yet have a way to auto-route requests containing images to vision-capable models, and the /v1/models endpoint doesn't advertise which models support image input. This breaks downstream AI frameworks (Vercel AI SDK, KiloCode, etc.) that check model capabilities before allowing image data through.
What happens: A user configures Manifest with Kimi 2.5 (or GPT-4o, Claude) behind a custom routing header like x-manifest-tier: StaticImage. The downstream framework sees the /v1/models response — which only returns { id, owned_by } — and hardcodes input: ["text"]. When a request contains image content, the framework strips it before the API call ever reaches Manifest. The image_url never gets there.
What Already Works
The /v1/chat/completions path already passes image_url content through unmodified — no changes needed there. And the recent commit 6f52fdd fixed image_url normalization on the Responses API path too. Manifest's proxy layer is fine; the bottleneck is upstream capability detection.
Feature Requests
Discussion #1563 asked about setting up routing for vision tasks. Currently Manifest has image_generation but no vision or image_understanding category. Adding one would let Manifest auto-route image-containing requests to models that support them:
routes:
categories:
vision: ["kimi-k2", "gpt-4o", "claude-sonnet"]
Return capability information so downstream frameworks can auto-detect vision support without manual config:
This matches how some providers (OpenAI, Anthropic) expose model capabilities and would let auto-discovered models get the right modality defaults without users having to hand-configure each one.
Why This Matters
Anyone using Manifest as an OpenAI-compatible provider for AI SDK workflows hits this wall — the model receives "ERROR: Cannot read image (this model does not support image input)" instead of the actual image, even though the upstream model is fully vision-capable. The fix isn't in Manifest's proxy code; it's in giving downstream tooling the information it needs to trust the model.