Skip to content

FEATURE REQUEST: Native Vision Routing Category #39

@dlasher

Description

@dlasher

Affected model

any vision-enabled models

What's wrong?

Something else

What it says now, and what it should be

Manifest doesn't yet have a way to auto-route requests containing images to vision-capable models, and the /v1/models endpoint doesn't advertise which models support image input. This breaks downstream AI frameworks (Vercel AI SDK, KiloCode, etc.) that check model capabilities before allowing image data through.

What happens: A user configures Manifest with Kimi 2.5 (or GPT-4o, Claude) behind a custom routing header like x-manifest-tier: StaticImage. The downstream framework sees the /v1/models response — which only returns { id, owned_by } — and hardcodes input: ["text"]. When a request contains image content, the framework strips it before the API call ever reaches Manifest. The image_url never gets there.

What Already Works
The /v1/chat/completions path already passes image_url content through unmodified — no changes needed there. And the recent commit 6f52fdd fixed image_url normalization on the Responses API path too. Manifest's proxy layer is fine; the bottleneck is upstream capability detection.

Feature Requests

  1. Native vision routing category
    Discussion #1563 asked about setting up routing for vision tasks. Currently Manifest has image_generation but no vision or image_understanding category. Adding one would let Manifest auto-route image-containing requests to models that support them:

routes:

  • path: /v1/chat/completions
    categories:
    vision: ["kimi-k2", "gpt-4o", "claude-sonnet"]
  1. /v1/models endpoint: include modality metadata
    Return capability information so downstream frameworks can auto-detect vision support without manual config:
{
  "data": [
    {
      "id": "kimi-k2",
      "owned_by": "moonshot",
      "capabilities": {
        "input": ["text", "image"],
        "output": ["text"],
        "vision": true
      }
    }
  ]
}

This matches how some providers (OpenAI, Anthropic) expose model capabilities and would let auto-discovered models get the right modality defaults without users having to hand-configure each one.

Why This Matters
Anyone using Manifest as an OpenAI-compatible provider for AI SDK workflows hits this wall — the model receives "ERROR: Cannot read image (this model does not support image input)" instead of the actual image, even though the upstream model is fully vision-capable. The fix isn't in Manifest's proxy code; it's in giving downstream tooling the information it needs to trust the model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    incorrect-dataA value in the catalog is wrong

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions