Skip to content

docs: Content Signals in robots.txt for AI content-usage preferences#179

Merged
juan-malbeclabs merged 4 commits into
mainfrom
content-signals-robots
Jun 10, 2026
Merged

docs: Content Signals in robots.txt for AI content-usage preferences#179
juan-malbeclabs merged 4 commits into
mainfrom
content-signals-robots

Conversation

@juan-malbeclabs

@juan-malbeclabs juan-malbeclabs commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What

Adds docs/robots.txt with a Content Signals directive declaring AI content-usage preferences:

Content-Signal: search=yes, ai-input=yes, ai-train=yes

Per contentsignals.org / draft-romm-aipref-contentsignals. Addresses the isitagentready.com check "No Content Signals found in robots.txt".

Why these values

The signal values mirror the site's existing demonstrated stance: the AI-crawler allowlist already explicitly permits every training-capable crawler (GPTBot, ClaudeBot, Google-Extended, …). For public protocol documentation we want it indexed (search), used to ground AI answers (ai-input), and available for training (ai-train). Flip any value to no to opt out.

Notes

  • main had no docs/robots.txt, so this file is complete on its own (basic allow rules + AI-crawler allowlist + sitemap + Content-Signal).
  • mkdocs copies docs/robots.txt verbatim to /robots.txt. Verified locally: mkdocs build succeeds and GET /robots.txt returns 200 text/plain with the directive present.
  • The isitagentready.com scanner reads production, so the check turns green only after this merges and the deploy workflow runs (deploy triggers on push to main).
  • Overlap with ai-agent-ux: that branch also adds docs/robots.txt. This version is a strict superset of it (same allowlist + sitemap, plus the header comment and Content-Signal block), so any merge conflict resolves by keeping this version.

…ots.txt

Add docs/robots.txt with a Content-Signal directive (search=yes,
ai-input=yes, ai-train=yes) per contentsignals.org /
draft-romm-aipref-contentsignals, declaring that this public protocol
documentation may be indexed, used to ground AI answers, and used for
training. Values mirror the site's existing stance of explicitly
allowing all AI crawlers (GPTBot, ClaudeBot, Google-Extended, etc.).

Also includes the basic allow rules, AI-crawler allowlist, and sitemap
reference so robots.txt is complete on its own (main had none).
@juan-malbeclabs juan-malbeclabs force-pushed the content-signals-robots branch from 41f5705 to bea37d7 Compare June 9, 2026 21:01
juan-malbeclabs and others added 3 commits June 9, 2026 16:02
Re-apply the Content Signals declaration (search=yes, ai-input=yes,
ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals.
It was dropped when main (which already added robots.txt via #176) was
merged into this branch. Values mirror the existing AI-crawler allowlist.
…lls)

Add hooks/emit_well_known.py to copy well-known/ into the built site's
.well-known/ dir, publishing an MCP Server Card (SEP-1649) and an Agent
Skills discovery index (v0.2.0) whose digests are computed at build time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@juan-malbeclabs juan-malbeclabs merged commit d4eb1e4 into main Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants