PDF: single-layer text selection with gen-time margins by andiwand · Pull Request #578 · opendocument-app/OpenDocument.core

andiwand · 2026-06-30T19:13:20Z

🤖 Generated with Claude Code

Summary

Implements a single-layer PDF text model (à la pdf2htmlEX) as an alternative to the current dual-layer approach on pdf-text-selection-layer. The full design rationale is in src/odr/internal/pdf/SINGLE_LAYER_SELECTION_PLAN.md.

How it works:

One absolutely-positioned <div> per PDF line; runs inside flow inline, each nudged by a gen-time margin-left = (pdf_x − prev_end) × pt_to_px.
Clean glyphs (unambiguous Unicode↔glyph) get real-Unicode cmap entries → the visible text is the DOM text → natively findable, selectable, copyable. No JS.
Unclean glyphs (ligatures, ambiguous cmap, no_unicode) are painted via CSS ::before{content:attr(data-g)} generated content — the PUA glyph is outside the DOM text stream, so it never breaks Ctrl+F or double-click mid-word. A zero-width .ov overlay carries the real Unicode alongside.

Comparison with pdf-text-selection-layer:

	`pdf-text-selection-layer`	this branch
Layers	2 (visual PUA + transparent selection)	1 (single selectable layer)
X-position fit	runtime JS `letter-spacing`	gen-time `margin-left`
Font for selection	unknown system font → JS fit needed	embedded font (advances known)
Ligature/no_unicode	PUA in DOM text + `user-select:none`	CSS generated content (out of text stream)
JS dependency	yes (on-load fit pass)	none

Test plan

Full test suite (odr_test): 658 passed, 8 skipped (pre-existing), 0 failed
Visual comparison of PDF output in a browser vs pdf-text-selection-layer
Verify Ctrl+F / double-click / triple-click on a PDF with ligature glyphs
Verify .ov overlay find-contiguity across inline-block boundary (known trade-off; see plan §3)

Records the design discussion for a mostly single-layer text model (à la pdf2htmlEX) as an alternative to the current dual-layer selection approach on pdf-text-selection-layer. Clean glyphs carry real Unicode in one findable/selectable layer positioned by gen-time margins; unclean glyphs (ligatures / no_unicode) are painted via CSS generated content with an overlapping transparent real-Unicode overlay. Design only; not implemented. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Mq2d2eFjjCL8cHpU9pHugq

Replaces the dual-layer (visual PUA + transparent selection) scheme with a single absolutely-positioned line block per PDF line whose runs flow inline. Clean, unambiguous glyphs carry real Unicode cmap entries and render directly as selectable/findable DOM text. Unclean glyphs (ligatures, ambiguous cmaps, no_unicode) are painted via CSS `content:attr(data-g)` generated content — keeping the PUA glyph out of the DOM text stream so it can never break a word mid-sequence for Ctrl+F or double-click — with a zero-width transparent `.ov` overlay carrying the real Unicode alongside. Inter-run x-position corrections are computed at generation time as `margin-left = (pdf_x − prev_end) × pt_to_px`; the browser renders the embedded font whose advances are known, so no runtime JS measurement is needed. Baseline drift > 0.6 em and backward jumps > 0.5 em open a new block. A path or image item flushes the open block first to preserve paint order. See src/odr/internal/pdf/SINGLE_LAYER_SELECTION_PLAN.md for the full design discussion (committed separately as the branch's first commit). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Mq2d2eFjjCL8cHpU9pHugq

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0ed0ee3e8f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-30T19:17:46Z

+            // Gen-time gap to the previous run's right edge (signed). This is
+            // the run's `margin-left`: the browser flows the previous run by
+            // its embedded-font advance, so this reproduces the PDF x-position
+            // (exact when the font's `hmtx` matches the PDF `/Widths`).
+            margin_px = round2((ox - prev_end) * pt_to_px);


Base flowed-run margins on rendered advances

When multiple PDF text segments are flowed into one line, this margin only accounts for the PDF gap from the previous segment's parsed /Widths advance. The browser, however, advances the previous inline run using the emitted font's own metrics (or an arbitrary fallback font when font == 0), and this change no longer absolutely positions each segment. For PDFs with non-embedded/unsupported fonts or embedded fonts whose hmtx differs from the PDF widths, every following segment on the same baseline is shifted by that metric mismatch; compute the margin from the rendered/emitted advance or keep those runs separately positioned.

Useful? React with 👍 / 👎.

andiwand and others added 2 commits June 30, 2026 19:58

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

andiwand mentioned this pull request Jul 1, 2026

PDF text: dual-layer + single-layer rendering with PdfTextMode option #579

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDF: single-layer text selection with gen-time margins#578

PDF: single-layer text selection with gen-time margins#578
andiwand wants to merge 2 commits into
mainfrom
pdf-single-layer-selection

andiwand commented Jun 30, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andiwand commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andiwand commented Jun 30, 2026 •

edited

Loading