FEATURE REQUEST: Native Vision Routing Category

### Affected model

any vision-enabled models

### What's wrong?

Something else

### What it says now, and what it should be
Manifest doesn't yet have a way to auto-route requests containing images to vision-capable models, and the /v1/models endpoint doesn't advertise which models support image input. This breaks downstream AI frameworks (Vercel AI SDK, KiloCode, etc.) that check model capabilities before allowing image data through.

**What happens:** A user configures Manifest with Kimi 2.5 (or GPT-4o, Claude) behind a custom routing header like x-manifest-tier: StaticImage. The downstream framework sees the /v1/models response — which only returns { id, owned_by } — and hardcodes input: ["text"]. When a request contains image content, the framework strips it before the API call ever reaches Manifest. The image_url never gets there.

**What Already Works**
The /v1/chat/completions path already passes image_url content through unmodified — no changes needed there. And the recent commit 6f52fdd fixed image_url normalization on the Responses API path too. Manifest's proxy layer is fine; the bottleneck is upstream capability detection.

**Feature Requests**
1. Native vision routing category
Discussion #1563 asked about setting up routing for vision tasks. Currently Manifest has image_generation but no vision or image_understanding category. Adding one would let Manifest auto-route image-containing requests to models that support them:

routes:
  - path: /v1/chat/completions
    categories:
      vision: ["kimi-k2", "gpt-4o", "claude-sonnet"]
2. /v1/models endpoint: include modality metadata
Return capability information so downstream frameworks can auto-detect vision support without manual config:

```
{
  "data": [
    {
      "id": "kimi-k2",
      "owned_by": "moonshot",
      "capabilities": {
        "input": ["text", "image"],
        "output": ["text"],
        "vision": true
      }
    }
  ]
}
```

This matches how some providers (OpenAI, Anthropic) expose model capabilities and would let auto-discovered models get the right modality defaults without users having to hand-configure each one.

**Why This Matters**
Anyone using Manifest as an OpenAI-compatible provider for AI SDK workflows hits this wall — the model receives "ERROR: Cannot read image (this model does not support image input)" instead of the actual image, even though the upstream model is fully vision-capable. The fix isn't in Manifest's proxy code; it's in giving downstream tooling the information it needs to trust the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURE REQUEST: Native Vision Routing Category #39

Affected model

What's wrong?

What it says now, and what it should be

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FEATURE REQUEST: Native Vision Routing Category #39

Description

Affected model

What's wrong?

What it says now, and what it should be

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions