docs: Content Signals in robots.txt for AI content-usage preferences by juan-malbeclabs · Pull Request #179 · malbeclabs/docs

juan-malbeclabs · 2026-06-09T20:52:09Z

What

Adds docs/robots.txt with a Content Signals directive declaring AI content-usage preferences:

Content-Signal: search=yes, ai-input=yes, ai-train=yes

Per contentsignals.org / draft-romm-aipref-contentsignals. Addresses the isitagentready.com check "No Content Signals found in robots.txt".

Why these values

The signal values mirror the site's existing demonstrated stance: the AI-crawler allowlist already explicitly permits every training-capable crawler (GPTBot, ClaudeBot, Google-Extended, …). For public protocol documentation we want it indexed (search), used to ground AI answers (ai-input), and available for training (ai-train). Flip any value to no to opt out.

Notes

main had no docs/robots.txt, so this file is complete on its own (basic allow rules + AI-crawler allowlist + sitemap + Content-Signal).
mkdocs copies docs/robots.txt verbatim to /robots.txt. Verified locally: mkdocs build succeeds and GET /robots.txt returns 200 text/plain with the directive present.
The isitagentready.com scanner reads production, so the check turns green only after this merges and the deploy workflow runs (deploy triggers on push to main).
Overlap with ai-agent-ux: that branch also adds docs/robots.txt. This version is a strict superset of it (same allowlist + sitemap, plus the header comment and Content-Signal block), so any merge conflict resolves by keeping this version.

…ots.txt Add docs/robots.txt with a Content-Signal directive (search=yes, ai-input=yes, ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals, declaring that this public protocol documentation may be indexed, used to ground AI answers, and used for training. Values mirror the site's existing stance of explicitly allowing all AI crawlers (GPTBot, ClaudeBot, Google-Extended, etc.). Also includes the basic allow rules, AI-crawler allowlist, and sitemap reference so robots.txt is complete on its own (main had none).

Re-apply the Content Signals declaration (search=yes, ai-input=yes, ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals. It was dropped when main (which already added robots.txt via #176) was merged into this branch. Values mirror the existing AI-crawler allowlist.

…lls) Add hooks/emit_well_known.py to copy well-known/ into the built site's .well-known/ dir, publishing an MCP Server Card (SEP-1649) and an Agent Skills discovery index (v0.2.0) whose digests are computed at build time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

juan-malbeclabs force-pushed the content-signals-robots branch from 41f5705 to bea37d7 Compare June 9, 2026 21:01

juan-malbeclabs and others added 3 commits June 9, 2026 16:02

Merge branch 'main' into content-signals-robots

aa3f415

armcconnell approved these changes Jun 10, 2026

View reviewed changes

juan-malbeclabs merged commit d4eb1e4 into main Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Content Signals in robots.txt for AI content-usage preferences#179

docs: Content Signals in robots.txt for AI content-usage preferences#179
juan-malbeclabs merged 4 commits into
mainfrom
content-signals-robots

juan-malbeclabs commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juan-malbeclabs commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why these values

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juan-malbeclabs commented Jun 9, 2026 •

edited

Loading