perf: use unsynchronized StringBuilderWriter in TomlRenderer#875
Merged
stephenamar-db merged 2 commits intoJun 3, 2026
Merged
Conversation
This was referenced May 30, 2026
stephenamar-db
pushed a commit
that referenced
this pull request
Jun 2, 2026
## Motivation `std.manifestJson`, `std.manifestJsonMinified`, and `std.manifestJsonEx` routed through `java.io.StringWriter`, paying `StringBuffer` synchronization per `write`/`flush` on the hot manifestation path. Source-built jrsonnet comparisons showed sjsonnet trailing on object-heavy manifest workloads. ## Modification - Add `StringBuilderWriter`: an unsynchronized `Writer` over a `StringBuilder`. - Add package-private `FastMaterializeJsonRenderer` backed by `StringBuilderWriter`; route the three `std.manifestJson*` builtins through it. Public `MaterializeJsonRenderer` ABI/shape unchanged. - Use an in-place codepoint sort for `sortedVisibleKeyNames` / `maybeSortKeys` (avoids `.sorted` boxing). - Fix codepoint comparison for raw surrogate prefixes; `UnicodeHandlingTests` extended. ## Result Scala Native hyperfine on kube-prometheus, jrsonnet HEAD `2d7eed05`: | Workload (native) | Before | After | Δ | |---|---:|---:|---:| | kube-prometheus, sjsonnet | 158.4 ± 16.8 ms | 143.7 ± 3.2 ms | **−9.3%** | | `manifestJsonEx`, sjsonnet | — | 5.09 ± 1.01 ms | new | ## Test plan - [x] `./mill __.reformat` - [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 518/518 pass This PR is the base for the stacked follow-ups #875 (TomlRenderer reuses `StringBuilderWriter`), and the independent #876/#877/#878.
std.manifestTomlEx routed through java.io.StringWriter, whose backing StringBuffer pays a monitor enter/exit on every write/flush on the hot TOML manifestation path. Switch TomlRenderer and the manifestTomlEx render path in ManifestModule to the unsynchronized package-private StringBuilderWriter (the same writer the JSON manifest renderer uses). Output is byte-identical; std.deepJoin keeps StringWriter (separate concern). Result (Scala Native hyperfine, TOML-heavy workload, ~1.8 MB output): after ran 1.11 ± 0.07x faster than before (~10%); output byte-identical.
933ed41 to
e327ba2
Compare
Contributor
Author
|
@stephenamar-db rebased |
…leInternal
Each TOML table iteration was doing redundant work for every key:
* v.value(k) was called twice — once to classify scalar vs section, then again
to render or recurse. The cache deduplicates the result but the lookup itself
still costs.
* visibleKeyNames was iterated and each key binary-searched back into
sortedVisibleKeyNames. Iterating sortedVisibleKeyNames directly is simpler
and skips O(n log n) compares per table.
* childIndent (cumulatedIndent + indent) was allocated inside the section loop
once per section, all producing the same String for sibling sections.
Also pre-size the output StringBuilderWriter to 1 KiB at the manifestTomlEx
entry point so small/medium outputs skip the first few StringBuilder doublings.
Output byte-identical (no behavior change).
3 tasks
stephenamar-db
pushed a commit
that referenced
this pull request
Jun 3, 2026
## Motivation `std.deepJoin` writes each `Val.Str` chunk into a `java.io.StringWriter` inside a tight loop. `StringWriter`'s backing `StringBuffer` pays a monitor enter/exit on every `write`/`append` call, which on a typical deepJoin walk over a deeply nested array can be hundreds of thousands of synchronized writes — wasted overhead in single-threaded jsonnet evaluation. `TomlRenderer` and `FastMaterializeJsonRenderer` already use the unsynchronized package-private `StringBuilderWriter` for the same reason (#874, #875). `std.deepJoin` was explicitly left as a follow-up in #875's description (*"std.deepJoin keeps StringWriter (separate concern)"*) — this PR is that follow-up. ## Modification Single change in `ManifestModule.scala`: swap the `new StringWriter()` in `DeepJoin.evalRhs` for `new StringBuilderWriter()`. No other code changes; output is byte-identical. ## Result Scala Native, hyperfine A/B against `master` (`fc292fa6`). Workload: a 50,000-row array of 10 pre-allocated strings → 2 MB of `deepJoin` output, render-dominated. Four interleaved-order passes, `--warmup 10 --min-runs 100 --shell=none`: | pass | order | before mean | after mean | before min | after min | **min ratio** | |---|---|---:|---:|---:|---:|---:| | 1 | before → after | 35.1 ± 16.5 ms | 32.2 ± 19.1 ms | 23.1 ms | 18.7 ms | **1.24×** | | 2 | after → before | 43.7 ± 30.6 ms | 29.9 ± 25.3 ms | 25.7 ms | 20.3 ms | **1.27×** | | 3 | before → after | 30.3 ± 8.5 ms | 29.5 ± 7.1 ms | 24.6 ms | 20.8 ms | **1.18×** | | 4 | after → before | 32.6 ± 7.6 ms | 28.0 ± 6.8 ms | 24.0 ms | 20.7 ms | **1.16×** | After is faster in every one of the 4 passes; mean is noisy on the host but min values are tight at **1.16–1.27× faster** (best observed 18.7 vs 23.1 ms, ~19% reduction). Output byte-identical (2,000,000 bytes both sides). ## Test plan - [x] `./mill __.reformat` - [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 519/519 pass - [x] Scala Native A/B hyperfine — 4 interleaved-order passes, all positive; output byte-identical --- > Independent of #875; can land in either order. After both land, the `import java.io.StringWriter` in `ManifestModule.scala` can be removed in a small cleanup.
He-Pin
added a commit
to He-Pin/sjsonnet
that referenced
this pull request
Jun 4, 2026
… stdlib Motivation: Continues the synchronous StringWriter elimination started by PR databricks#875 (TomlRenderer), databricks#889 (std.deepJoin), and the existing commit in this PR (escapeStringJson/Python). Remaining hot paths still allocate a java.io.StringWriter with its synchronized StringBuffer. Modification: - StringBuilderWriter: expose getBuilder for direct StringBuilder access (YamlRenderer trims trailing spaces via length/charAt/setLength). - Renderer / PythonRenderer: default `out` constructor parameter changes from java.io.StringWriter to StringBuilderWriter. Format.scala's complex-type Renderer paths and Val.Obj.renderString automatically benefit via the new default. - YamlRenderer: full visitor type-parameter migration from StringWriter to StringBuilderWriter; outBuffer switches from StringBuffer to StringBuilder (same length/setLength/charAt interface). - PrettyYamlRenderer: default `out` parameter to StringBuilderWriter; also drops a dead StringWriter allocation in the quoted-string branch (the writer's result was never consumed — `quotedStr` is used instead). - std.escapeStringXML: replace internal StringWriter with StringBuilderWriter, pre-sized to input length + 16. - bench: MaterializerBenchmark switches its renderWith helper to StringBuilderWriter to match the new constructor signatures. Result: - Eliminates StringBuffer monitor acquisition on every char/string write across all remaining renderers (Yaml/PrettyYaml/Python/Json default paths) and std.escapeStringXML. - Removes one dead StringWriter allocation per quoted-string in PrettyYamlRenderer. - All JVM tests pass (sjsonnet.jvm[3.3.7].test).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.manifestTomlExhad three sources of avoidable overhead on the hot manifestation path:TomlRendererandManifestModule.evalRhsrendered into ajava.io.StringWriter, whose backingStringBufferpays a monitor enter/exit on everywrite/flush. TheFastMaterializeJsonRendereralready uses the unsynchronizedStringBuilderWriter(perf: speed up manifest JSON rendering #874); TOML did not.renderTableInternal. Each key'sVal.Obj.value(k)was resolved twice — once to classify scalar vs section, then again to render or recurse. The cache deduplicates the result, but the lookup itself still costs.visibleKeyNameswas iterated and each key binary-searched back intosortedVisibleKeyNames—sortedVisibleKeyNamescan be iterated directly, skippingO(n log n)compares per table.Modification
Two commits:
perf: use unsynchronized StringBuilderWriter in TomlRenderer— SwapTomlRendererand themanifestTomlExrender path inManifestModulefromjava.io.StringWriterto the package-privateStringBuilderWriter.std.deepJoinkeepsStringWriter(separate concern).perf: cache resolved field values and skip binary search in renderTableInternal— Resolve each field once into aresolved: Array[Val]during section classification and reuse it during render/recurse; iteratesortedVisibleKeyNamesdirectly (removes the now-unusedsortedKeyIndexbinary search); hoistchildIndent = cumulatedIndent + indentout of the section loop (was an identical allocation per sibling section); pre-size the outputStringBuilderWriterto 1 KiB so small/medium outputs skip the first ~6 doublings.Output is byte-identical (verified at 1,228,186 bytes on the benchmark workload).
Result
Scala Native, hyperfine A/B against
master(fc292fa6). Workload: object comprehension over 8000 small tables → ~1.2 MB TOML output (render-dominated). Four interleaved-order passes,--warmup 10 --min-runs 100 --shell=none:Mean is noisy on the host (1.12× – 1.29×), but after is faster in every one of the 4 passes and the min values are tight at ~1.27–1.34× faster (best observed: 42.0 ms vs 56.4 ms, ~25.5% reduction). Output byte-identical, 1,228,186 bytes both sides.
For comparison, the StringBuilderWriter swap alone (commit 1) measures ~1.08–1.14× min; the cache + binary-search elimination + childIndent hoist (commit 2) lifts that to ~1.27–1.34× min.
Test plan
./mill __.reformat./mill 'sjsonnet.jvm[3.3.7]'.test— 519/519 pass