Improve docs UX for AI agents (llms.txt, on-domain markdown, robots, meta) by juan-malbeclabs · Pull Request #176 · malbeclabs/docs

juan-malbeclabs · 2026-06-09T13:43:19Z

Why

The docs site is solid for humans but didn't meet current conventions for consumption by AI agents / LLMs. Analysis of the built site/ surfaced concrete gaps (no llms.txt, no on-domain raw Markdown, broken "View Markdown" for translated pages, zero <meta name="description">, no robots.txt, a duplicated toolbar).

What changed

Area	Change
llms.txt	New `/llms.txt` (curated, sectioned index with descriptions) and `/llms-full.txt` (full concatenated content), generated from the default (English) locale.
On-domain Markdown	Every page now also serves clean Markdown at `<page>/index.md` (all 8 languages), so agents can fetch source Markdown without GitHub.
Toolbar fix	"View Markdown" / "Copy Page" / "Ask in ChatGPT/Claude" now use the on-domain `.md`. The old GitHub-raw link was broken for all 7 translated locales (`/es/setup/` → requested `docs/es/setup.md`, a 404; real source is `docs/setup.es.md`). Removed the duplicate `page-toolbar.js` include that rendered the toolbar twice.
robots.txt	New `docs/robots.txt` explicitly allowing AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, …) and referencing the sitemap.
Meta descriptions	Added `site_description` + per-page `description` front matter. Every page now has a `<meta name="description">` (was 0).
Housekeeping	Fixed `faviron` → `favicon` typo; removed `get-pip.py` (2 MB), `hold from setup`, `.DS_Store`; ignore `.DS_Store` / `__pycache__`.

Implementation note

The mkdocs-llmstxt plugin was the obvious choice but is incompatible with mkdocs-static-i18n — it can't resolve localized page URIs and skips every page. So llms.txt / llms-full.txt and the per-page Markdown are produced by a single MkDocs hook (hooks/emit_markdown.py) that filters to the default locale. No new CI dependency.

Verification

Built locally with mkdocs build:

/llms.txt, /llms-full.txt, /robots.txt present at site root.
site/setup/index.md and site/es/setup/index.md exist with clean (front-matter-stripped) Markdown, correctly translated.
208 pages now carry <meta name="description"> (pages without their own description fall back to site_description).
page-toolbar.js included exactly once per page (was twice).
Served via mkdocs serve: toolbar appears once; "View Markdown" opens the on-domain .md and works on translated pages.

🤖 Generated with Claude Code

… meta) - Generate llms.txt and llms-full.txt from the default (English) locale via hooks/emit_markdown.py (the mkdocs-llmstxt plugin is incompatible with mkdocs-static-i18n). - Emit clean per-page Markdown as <page>/index.md for every language, served on-domain. Repoint the page toolbar's View Markdown / Copy / Ask actions at these instead of GitHub raw (which was broken for all 7 translated locales). - Add robots.txt explicitly allowing AI crawlers and referencing the sitemap. - Add site_description plus per-page descriptions so every page now has a <meta name="description"> (was none). - Fix duplicate page-toolbar.js include and the favicon typo (faviron). - Remove repo cruft (get-pip.py, "hold from setup", .DS_Store) and ignore .DS_Store / __pycache__. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Every nav page now has a description: front matter, so each gets a unique <meta name="description"> and a description in llms.txt instead of falling back to the generic site_description.

armcconnell · 2026-06-09T15:30:47Z

Reviewed and built locally — all claims verified (llms.txt/llms-full.txt/robots.txt/sitemap at root, 209/209 pages with meta descriptions, on-domain .md translated and front-matter-stripped, toolbar included once, cruft removed).

Pushed a follow-up commit adding description: front matter to the remaining nav pages, so each now gets a unique <meta name="description"> and a description in llms.txt instead of the generic site_description.

Filed two non-blocking follow-ups:

Raw on-domain .md uses source-relative links that 404 when fetched #177 — raw on-domain .md keeps source-relative links that 404 when fetched
llms.txt SECTIONS list is hand-maintained in parallel with nav #178 — llms.txt SECTIONS list is hand-maintained in parallel with nav

Also noted (out of scope): site/ is now in .gitignore but the built output is still tracked in the index (253 files) — a git rm -r --cached site/ would finish that cleanup.

armcconnell

Approving. Changes do what they claim and directly improve agent readiness; verified with a local build. Per-page descriptions added in a follow-up commit; remaining items tracked in #177 and #178.

Re-apply the Content Signals declaration (search=yes, ai-input=yes, ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals. It was dropped when main (which already added robots.txt via #176) was merged into this branch. Values mirror the existing AI-crawler allowlist.

…179) * docs: declare AI content usage preferences via Content Signals in robots.txt Add docs/robots.txt with a Content-Signal directive (search=yes, ai-input=yes, ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals, declaring that this public protocol documentation may be indexed, used to ground AI answers, and used for training. Values mirror the site's existing stance of explicitly allowing all AI crawlers (GPTBot, ClaudeBot, Google-Extended, etc.). Also includes the basic allow rules, AI-crawler allowlist, and sitemap reference so robots.txt is complete on its own (main had none). * docs: add Content-Signal directive to robots.txt Re-apply the Content Signals declaration (search=yes, ai-input=yes, ai-train=yes) per contentsignals.org / draft-romm-aipref-contentsignals. It was dropped when main (which already added robots.txt via #176) was merged into this branch. Values mirror the existing AI-crawler allowlist. * docs: publish .well-known agent-discovery files (MCP card + Agent Skills) Add hooks/emit_well_known.py to copy well-known/ into the built site's .well-known/ dir, publishing an MCP Server Card (SEP-1649) and an Agent Skills discovery index (v0.2.0) whose digests are computed at build time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

juan-malbeclabs requested a review from armcconnell June 9, 2026 13:48

github-actions Bot and others added 2 commits June 9, 2026 14:02

chore: auto-translate docs

ebd503e

docs: add per-page descriptions to remaining nav pages

ce38526

Every nav page now has a description: front matter, so each gets a unique <meta name="description"> and a description in llms.txt instead of falling back to the generic site_description.

This was referenced Jun 9, 2026

Raw on-domain .md uses source-relative links that 404 when fetched #177

Open

llms.txt SECTIONS list is hand-maintained in parallel with nav #178

Open

armcconnell approved these changes Jun 9, 2026

View reviewed changes

juan-malbeclabs merged commit 0ce3e33 into main Jun 9, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs UX for AI agents (llms.txt, on-domain markdown, robots, meta)#176

Improve docs UX for AI agents (llms.txt, on-domain markdown, robots, meta)#176
juan-malbeclabs merged 3 commits into
mainfrom
ai-agent-ux

juan-malbeclabs commented Jun 9, 2026

Uh oh!

armcconnell commented Jun 9, 2026

Uh oh!

armcconnell left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juan-malbeclabs commented Jun 9, 2026

Why

What changed

Implementation note

Verification

Uh oh!

armcconnell commented Jun 9, 2026

Uh oh!

armcconnell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants