PDF: single-layer text selection with gen-time margins#578
Conversation
Records the design discussion for a mostly single-layer text model (à la pdf2htmlEX) as an alternative to the current dual-layer selection approach on pdf-text-selection-layer. Clean glyphs carry real Unicode in one findable/selectable layer positioned by gen-time margins; unclean glyphs (ligatures / no_unicode) are painted via CSS generated content with an overlapping transparent real-Unicode overlay. Design only; not implemented. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Mq2d2eFjjCL8cHpU9pHugq
Replaces the dual-layer (visual PUA + transparent selection) scheme with a single absolutely-positioned line block per PDF line whose runs flow inline. Clean, unambiguous glyphs carry real Unicode cmap entries and render directly as selectable/findable DOM text. Unclean glyphs (ligatures, ambiguous cmaps, no_unicode) are painted via CSS `content:attr(data-g)` generated content — keeping the PUA glyph out of the DOM text stream so it can never break a word mid-sequence for Ctrl+F or double-click — with a zero-width transparent `.ov` overlay carrying the real Unicode alongside. Inter-run x-position corrections are computed at generation time as `margin-left = (pdf_x − prev_end) × pt_to_px`; the browser renders the embedded font whose advances are known, so no runtime JS measurement is needed. Baseline drift > 0.6 em and backward jumps > 0.5 em open a new block. A path or image item flushes the open block first to preserve paint order. See src/odr/internal/pdf/SINGLE_LAYER_SELECTION_PLAN.md for the full design discussion (committed separately as the branch's first commit). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Mq2d2eFjjCL8cHpU9pHugq
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0ed0ee3e8f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Gen-time gap to the previous run's right edge (signed). This is | ||
| // the run's `margin-left`: the browser flows the previous run by | ||
| // its embedded-font advance, so this reproduces the PDF x-position | ||
| // (exact when the font's `hmtx` matches the PDF `/Widths`). | ||
| margin_px = round2((ox - prev_end) * pt_to_px); |
There was a problem hiding this comment.
Base flowed-run margins on rendered advances
When multiple PDF text segments are flowed into one line, this margin only accounts for the PDF gap from the previous segment's parsed /Widths advance. The browser, however, advances the previous inline run using the emitted font's own metrics (or an arbitrary fallback font when font == 0), and this change no longer absolutely positions each segment. For PDFs with non-embedded/unsupported fonts or embedded fonts whose hmtx differs from the PDF widths, every following segment on the same baseline is shifted by that metric mismatch; compute the margin from the rendered/emitted advance or keep those runs separately positioned.
Useful? React with 👍 / 👎.
🤖 Generated with Claude Code
Summary
Implements a single-layer PDF text model (à la pdf2htmlEX) as an alternative to the current dual-layer approach on
pdf-text-selection-layer. The full design rationale is insrc/odr/internal/pdf/SINGLE_LAYER_SELECTION_PLAN.md.How it works:
<div>per PDF line; runs inside flow inline, each nudged by a gen-timemargin-left = (pdf_x − prev_end) × pt_to_px.no_unicode) are painted via CSS::before{content:attr(data-g)}generated content — the PUA glyph is outside the DOM text stream, so it never breaks Ctrl+F or double-click mid-word. A zero-width.ovoverlay carries the real Unicode alongside.Comparison with
pdf-text-selection-layer:pdf-text-selection-layerletter-spacingmargin-leftuser-select:noneTest plan
odr_test): 658 passed, 8 skipped (pre-existing), 0 failedpdf-text-selection-layer.ovoverlay find-contiguity acrossinline-blockboundary (known trade-off; see plan §3)