Parser enhancements wave: integration base by StreamDemon · Pull Request #71 · StreamDemon/sploosh

StreamDemon · 2026-07-02T06:18:52Z

Summary

Integration base branch for the parser enhancement wave — quality/optimization sub-PRs (no behavior fixes) squash-merged into this branch; it lands in main with a true merge commit. Same stacked-PR pattern as the correctness wave (PR Parser correctness wave: integration base #69) and the v0.5.10 cleanup (PR Spec v0.5.10: Cleanup batch (integration) #40).

Wave complete — all six roadmap items shipped:

Binary/Unary op: String → dedicated operator enums (Replace operator strings with UnaryOp and BinaryOp enums #79)
Token.lexeme: String → span-based slicing (Token { kind, span } + Token::text(source); slicing chosen over interning — the token stream never outlives the source buffer in the bootstrap pipeline) (Store spans instead of lexeme strings on tokens #76)
Preserve attribute arguments in the AST (Attribute.args/.span per §16 attr_args; actor handlers are now Handler { attrs, function }) (Preserve attribute arguments and actor-handler attrs in the AST #80)
Dedupe the numeric-suffix list in the lexer (shared NUMERIC_SUFFIXES const, &'static str return) (Dedupe the lexer numeric-suffix list #75)
~~at/eat/expect take &TokenKind~~ Re-sliced: Token/TokenKind derive Copy, which removes every parse-loop .clone() with by-value call sites instead of reference threading — same goal, simpler result; only possible once Store spans instead of lexeme strings on tokens #76 removed the owned lexeme (Make Token and TokenKind Copy, drop parse-loop clones #77)
Corpus expansion for all accepted shapes — six new fixtures (extern/onchain, casts + literal zoo, async/.await, modules/use, attributes, struct-literal shapes) and auto-discovery of tests/corpus/*.sp in the harness (Expand the corpus to cover every accepted grammar shape #81)

Ride-along: issue #78 (spurious "expected identifier" for non-ident-headed type args like Vec<&str>) was found while designing the attribute-argument parser, verified with a probe, and filed for a separate fix — it is a behavior change, so it deliberately does not ride this wave.

Related Issue

None (review-driven; companion to PR #69).

Spec Sections Affected

None — implementation quality only; no grammar or behavior changes.

Checklist

Code follows the Sploosh design principles (one way to do it, explicit over implicit, etc.)
Documentation updated in relevant docs/ pages — N/A, no behavior change
Tests added or updated
All build targets still compile (if applicable)
Spec-only PR (skip Build Targets section if checked)

Build Targets Tested

cargo fmt --all -- --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace green on the assembled base (54 tests, 13 corpus fixtures). Every sub-PR was individually CI-green and cubic-reviewed before merging.

Test Plan

Each sub-PR carried its own tests: table-driven suffix coverage (Dedupe the lexer numeric-suffix list #75), Token::text span assertions (Store spans instead of lexeme strings on tokens #76), structural operator assertions (Replace operator strings with UnaryOp and BinaryOp enums #79), five attribute-shape/span tests (Preserve attribute arguments and actor-handler attrs in the AST #80), and 13 auto-discovered corpus fixtures (Expand the corpus to cover every accepted grammar shape #81).
Full workspace suite green at every merge point and on the final base.

Empty seed commit for the enhancement-wave base PR; the roadmap lives in the PR description. Rebased onto main after the parser correctness wave (PR #69) merged.

`numeric_suffix` and `validate_numeric_body` each carried their own copy of the 13-entry suffix table, so any future suffix change had to be made twice or the two paths would drift apart. Hoist the table into a shared `NUMERIC_SUFFIXES` const and return the matched `&'static str` instead of allocating a fresh `String` on every suffix scan. A new test iterates the shared table and lexes `1<suffix>` for every entry, so additions to the list are covered automatically.

cubic-dev-ai

No issues found across 1 file

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant L as Lexer
    participant Src as Source Text
    participant NS as NUMERIC_SUFFIXES const
    participant Tok as Token

    Note over L,Tok: Numeric literal scanning with shared suffix const

    L->>Src: rest = &source[pos..]
    L->>L: scan numeric digits & separators
    L->>NS: iterate over suffixes for match
    NS-->>L: suffix (e.g., "i32") as &'static str
    alt suffix found
        L->>Src: advance pos by suffix.len()
        L->>Tok: create token with &'static suffix (no String alloc)
    else no suffix
        L->>Tok: create token without suffix
    end

    Note over L: Later, validate_numeric_body uses same const
    L->>NS: find_map(f: strip suffix from body)
    NS-->>L: suffix to strip (if present)
    L->>L: validate separators in remainder

    Note over L,Tok: NEW: test covers all suffixes
    Test->>L: lex("1<i32>")
    L->>NS: iterate suffixes
    NS-->>L: "i32"
    L-->>Test: Token(IntLit, "1i32")
    Test->>Test: assert kind and lexeme

_{Auto-approved: Refactors numeric suffix handling into a shared const and eliminates per-scan allocations; adds a test covering all suffixes.

Re-trigger cubic}

Every token carried an owned copy of its source text, so lexing a file allocated one String per token even though the source buffer already holds the same bytes. Token is now just a kind and a span; the new `Token::text(source)` slices the original buffer on demand. The parser threads the source string through and derives text only at the few places that need it (identifiers, literals, the `vec` head, the extern target). Unary and binary operator text now comes from the token kind rather than the lexeme, since the kind already determines it.

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Refactors token representation in lexer and parser to use span-based text slicing; moderate risk due to core data structure change.

Re-trigger cubic}

With the lexeme gone, a token is a payload-free kind plus a span — both trivially copyable. Deriving Copy lets every parse loop pass and return tokens by value, so the `.clone()` calls sprinkled through `at`, `eat`, `expect`, `bump`, `peek_kind`, and the recovery helpers all disappear. The PR #71 roadmap sketched this slot as "`at`/`eat`/`expect` take `&TokenKind`"; deriving Copy reaches the same goal (no clones in parse loops) with by-value call sites instead of reference threading, which only became possible after the span-slicing change landed.

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: This PR refactors the lexer to store tokens as span-based slices and the parser to use them, along with other internal improvements. Although no behavior changes are intended, such refactors have a broad impact across core data structures and code paths, making human review necessary.

Re-trigger cubic}

`ExprKind::Unary`/`Binary` stored their operator as an owned String, which allocated per node and let any string masquerade as an operator. Dedicated enums make illegal operators unrepresentable, shrink the nodes, and give match exhaustiveness checking to every consumer. `=` never reaches the AST (it builds `ExprKind::Assign`), so the parser classifies infix tokens with a private `Infix { Assign, Op(BinaryOp) }` wrapper instead of widening the public enum with a variant no AST node can carry. `as_str()`/`Display` on both enums recover the source spelling for diagnostics and the future pretty-printer (#67).

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Introduces structural changes in AST, lexer, and parser (enums, span-based tokens, Copy); risk of subtle breakage across multiple crates.

Re-trigger cubic}

`@mailbox(capacity: 2048)` and `@supervisor(strategy: "one_for_one")` lost their arguments at parse time — the parser skipped everything inside the parens — and attributes on actor handlers were dropped outright. Nothing downstream (semantic analysis, diagnostics) could ever see them. `Attribute` now carries `args: Vec<AttrArg>` plus a span covering `@` through the closing paren, with `AttrArg` mirroring the §16 grammar (`attr_arg = IDENT [ ":" expr | "=" expr | "(" expr ")" ] | expr`). Only the `IDENT ":"` form needs lookahead; the `=` and call forms are valid expressions, so they parse as expressions and canonicalize to the most specific attr shape afterwards. Actor handlers become `Handler { attrs, function }`, since a handler is a `fn_def` and §16 puts attrs on `fn_def` itself. The now-unused `skip_balanced_after_open` helper is removed.

The crates/AGENTS.md rule is "add corpus tests for every grammar shape accepted", but extern blocks, onchain modules, use trees, casts, the literal zoo, async/.await, `?` outside pipes, attribute arguments, and struct-literal shapes had no fixtures. Six new fixtures close those gaps. The harness now discovers `tests/corpus/*.sp` instead of maintaining a hard-coded list, so a new fixture cannot be silently skipped; an is-empty guard catches a moved or emptied corpus directory.

StreamDemon force-pushed the feature/parser-enhancements-base branch from c04bde5 to c7bb5a3 Compare July 2, 2026 08:23

Open parser enhancements integration branch

1a01d83

Empty seed commit for the enhancement-wave base PR; the roadmap lives in the PR description. Rebased onto main after the parser correctness wave (PR #69) merged.

StreamDemon force-pushed the feature/parser-enhancements-base branch from c7bb5a3 to 1a01d83 Compare July 2, 2026 08:24

StreamDemon mentioned this pull request Jul 2, 2026

Dedupe the lexer numeric-suffix list #75

Merged

6 tasks

StreamDemon mentioned this pull request Jul 2, 2026

Store spans instead of lexeme strings on tokens #76

Merged

6 tasks

cubic-dev-ai Bot previously approved these changes Jul 2, 2026

View reviewed changes

StreamDemon dismissed cubic-dev-ai[bot]’s stale review via 3e65f7f July 2, 2026 11:16

StreamDemon mentioned this pull request Jul 2, 2026

Make Token and TokenKind Copy, drop parse-loop clones #77

Merged

6 tasks

cubic-dev-ai Bot reviewed Jul 2, 2026

View reviewed changes

StreamDemon mentioned this pull request Jul 2, 2026

Parser: spurious 'expected identifier' for non-ident-headed type args (Vec<&str>, tuples, arrays) #78

Closed

StreamDemon mentioned this pull request Jul 2, 2026

Replace operator strings with UnaryOp and BinaryOp enums #79

Merged

6 tasks

cubic-dev-ai Bot reviewed Jul 2, 2026

View reviewed changes

StreamDemon mentioned this pull request Jul 2, 2026

Preserve attribute arguments and actor-handler attrs in the AST #80

Merged

6 tasks

cubic-dev-ai Bot reviewed Jul 2, 2026

View reviewed changes

StreamDemon mentioned this pull request Jul 2, 2026

Expand the corpus to cover every accepted grammar shape #81

Merged

6 tasks

StreamDemon marked this pull request as ready for review July 2, 2026 11:47

StreamDemon merged commit 292c519 into main Jul 2, 2026
3 checks passed

StreamDemon deleted the feature/parser-enhancements-base branch July 2, 2026 11:48

StreamDemon mentioned this pull request Jul 2, 2026

Fix spurious error on non-ident-headed type arguments #82

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser enhancements wave: integration base#71

Parser enhancements wave: integration base#71
StreamDemon merged 7 commits into
mainfrom
feature/parser-enhancements-base

StreamDemon commented Jul 2, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

StreamDemon commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Spec Sections Affected

Checklist

Build Targets Tested

Test Plan

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

StreamDemon commented Jul 2, 2026 •

edited

Loading