Skip to content

Improve docs UX for AI agents (llms.txt, on-domain markdown, robots, meta)#176

Merged
juan-malbeclabs merged 3 commits into
mainfrom
ai-agent-ux
Jun 9, 2026
Merged

Improve docs UX for AI agents (llms.txt, on-domain markdown, robots, meta)#176
juan-malbeclabs merged 3 commits into
mainfrom
ai-agent-ux

Conversation

@juan-malbeclabs

Copy link
Copy Markdown
Contributor

Why

The docs site is solid for humans but didn't meet current conventions for consumption by AI agents / LLMs. Analysis of the built site/ surfaced concrete gaps (no llms.txt, no on-domain raw Markdown, broken "View Markdown" for translated pages, zero <meta name="description">, no robots.txt, a duplicated toolbar).

What changed

Area Change
llms.txt New /llms.txt (curated, sectioned index with descriptions) and /llms-full.txt (full concatenated content), generated from the default (English) locale.
On-domain Markdown Every page now also serves clean Markdown at <page>/index.md (all 8 languages), so agents can fetch source Markdown without GitHub.
Toolbar fix "View Markdown" / "Copy Page" / "Ask in ChatGPT/Claude" now use the on-domain .md. The old GitHub-raw link was broken for all 7 translated locales (/es/setup/ → requested docs/es/setup.md, a 404; real source is docs/setup.es.md). Removed the duplicate page-toolbar.js include that rendered the toolbar twice.
robots.txt New docs/robots.txt explicitly allowing AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, …) and referencing the sitemap.
Meta descriptions Added site_description + per-page description front matter. Every page now has a <meta name="description"> (was 0).
Housekeeping Fixed favironfavicon typo; removed get-pip.py (2 MB), hold from setup, .DS_Store; ignore .DS_Store / __pycache__.

Implementation note

The mkdocs-llmstxt plugin was the obvious choice but is incompatible with mkdocs-static-i18n — it can't resolve localized page URIs and skips every page. So llms.txt / llms-full.txt and the per-page Markdown are produced by a single MkDocs hook (hooks/emit_markdown.py) that filters to the default locale. No new CI dependency.

Verification

Built locally with mkdocs build:

  • /llms.txt, /llms-full.txt, /robots.txt present at site root.
  • site/setup/index.md and site/es/setup/index.md exist with clean (front-matter-stripped) Markdown, correctly translated.
  • 208 pages now carry <meta name="description"> (pages without their own description fall back to site_description).
  • page-toolbar.js included exactly once per page (was twice).
  • Served via mkdocs serve: toolbar appears once; "View Markdown" opens the on-domain .md and works on translated pages.

🤖 Generated with Claude Code

… meta)

- Generate llms.txt and llms-full.txt from the default (English) locale via
  hooks/emit_markdown.py (the mkdocs-llmstxt plugin is incompatible with
  mkdocs-static-i18n).
- Emit clean per-page Markdown as <page>/index.md for every language, served
  on-domain. Repoint the page toolbar's View Markdown / Copy / Ask actions at
  these instead of GitHub raw (which was broken for all 7 translated locales).
- Add robots.txt explicitly allowing AI crawlers and referencing the sitemap.
- Add site_description plus per-page descriptions so every page now has a
  <meta name="description"> (was none).
- Fix duplicate page-toolbar.js include and the favicon typo (faviron).
- Remove repo cruft (get-pip.py, "hold from setup", .DS_Store) and ignore
  .DS_Store / __pycache__.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
github-actions Bot and others added 2 commits June 9, 2026 14:02
Every nav page now has a description: front matter, so each gets a unique
<meta name="description"> and a description in llms.txt instead of falling
back to the generic site_description.
@armcconnell

Copy link
Copy Markdown
Contributor

Reviewed and built locally — all claims verified (llms.txt/llms-full.txt/robots.txt/sitemap at root, 209/209 pages with meta descriptions, on-domain .md translated and front-matter-stripped, toolbar included once, cruft removed).

Pushed a follow-up commit adding description: front matter to the remaining nav pages, so each now gets a unique <meta name="description"> and a description in llms.txt instead of the generic site_description.

Filed two non-blocking follow-ups:

Also noted (out of scope): site/ is now in .gitignore but the built output is still tracked in the index (253 files) — a git rm -r --cached site/ would finish that cleanup.

@armcconnell armcconnell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Changes do what they claim and directly improve agent readiness; verified with a local build. Per-page descriptions added in a follow-up commit; remaining items tracked in #177 and #178.

@juan-malbeclabs juan-malbeclabs merged commit 0ce3e33 into main Jun 9, 2026
1 check failed
juan-malbeclabs added a commit that referenced this pull request Jun 9, 2026
Re-apply the Content Signals declaration (search=yes, ai-input=yes,
ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals.
It was dropped when main (which already added robots.txt via #176) was
merged into this branch. Values mirror the existing AI-crawler allowlist.
juan-malbeclabs added a commit that referenced this pull request Jun 10, 2026
…179)

* docs: declare AI content usage preferences via Content Signals in robots.txt

Add docs/robots.txt with a Content-Signal directive (search=yes,
ai-input=yes, ai-train=yes) per contentsignals.org /
draft-romm-aipref-contentsignals, declaring that this public protocol
documentation may be indexed, used to ground AI answers, and used for
training. Values mirror the site's existing stance of explicitly
allowing all AI crawlers (GPTBot, ClaudeBot, Google-Extended, etc.).

Also includes the basic allow rules, AI-crawler allowlist, and sitemap
reference so robots.txt is complete on its own (main had none).

* docs: add Content-Signal directive to robots.txt

Re-apply the Content Signals declaration (search=yes, ai-input=yes,
ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals.
It was dropped when main (which already added robots.txt via #176) was
merged into this branch. Values mirror the existing AI-crawler allowlist.

* docs: publish .well-known agent-discovery files (MCP card + Agent Skills)

Add hooks/emit_well_known.py to copy well-known/ into the built site's
.well-known/ dir, publishing an MCP Server Card (SEP-1649) and an Agent
Skills discovery index (v0.2.0) whose digests are computed at build time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants