Skip to content

perf: use unsynchronized StringBuilderWriter in TomlRenderer#875

Merged
stephenamar-db merged 2 commits into
databricks:masterfrom
He-Pin:perf/toml-stringbuilder-writer
Jun 3, 2026
Merged

perf: use unsynchronized StringBuilderWriter in TomlRenderer#875
stephenamar-db merged 2 commits into
databricks:masterfrom
He-Pin:perf/toml-stringbuilder-writer

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 30, 2026

Motivation

std.manifestTomlEx had three sources of avoidable overhead on the hot manifestation path:

  1. Synchronized writer. TomlRenderer and ManifestModule.evalRhs rendered into a java.io.StringWriter, whose backing StringBuffer pays a monitor enter/exit on every write/flush. The FastMaterializeJsonRenderer already uses the unsynchronized StringBuilderWriter (perf: speed up manifest JSON rendering #874); TOML did not.
  2. Redundant field lookups in renderTableInternal. Each key's Val.Obj.value(k) was resolved twice — once to classify scalar vs section, then again to render or recurse. The cache deduplicates the result, but the lookup itself still costs.
  3. Wasted indexing work. visibleKeyNames was iterated and each key binary-searched back into sortedVisibleKeyNamessortedVisibleKeyNames can be iterated directly, skipping O(n log n) compares per table.

Modification

Two commits:

  • perf: use unsynchronized StringBuilderWriter in TomlRenderer — Swap TomlRenderer and the manifestTomlEx render path in ManifestModule from java.io.StringWriter to the package-private StringBuilderWriter. std.deepJoin keeps StringWriter (separate concern).
  • perf: cache resolved field values and skip binary search in renderTableInternal — Resolve each field once into a resolved: Array[Val] during section classification and reuse it during render/recurse; iterate sortedVisibleKeyNames directly (removes the now-unused sortedKeyIndex binary search); hoist childIndent = cumulatedIndent + indent out of the section loop (was an identical allocation per sibling section); pre-size the output StringBuilderWriter to 1 KiB so small/medium outputs skip the first ~6 doublings.

Output is byte-identical (verified at 1,228,186 bytes on the benchmark workload).

Result

Scala Native, hyperfine A/B against master (fc292fa6). Workload: object comprehension over 8000 small tables → ~1.2 MB TOML output (render-dominated). Four interleaved-order passes, --warmup 10 --min-runs 100 --shell=none:

pass order before mean after mean before min after min min ratio
1 before → after 59.4 ± 2.7 ms 53.2 ± 23.4 ms 55.4 ms 43.8 ms 1.27×
2 after → before 64.1 ± 7.7 ms 51.8 ± 12.2 ms 56.4 ms 43.7 ms 1.29×
3 before → after 64.1 ± 8.1 ms 53.2 ± 14.3 ms 56.4 ms 42.0 ms 1.34×
4 after → before 63.3 ± 14.3 ms 49.2 ± 3.7 ms 57.2 ms 42.8 ms 1.34×

Mean is noisy on the host (1.12× – 1.29×), but after is faster in every one of the 4 passes and the min values are tight at ~1.27–1.34× faster (best observed: 42.0 ms vs 56.4 ms, ~25.5% reduction). Output byte-identical, 1,228,186 bytes both sides.

For comparison, the StringBuilderWriter swap alone (commit 1) measures ~1.08–1.14× min; the cache + binary-search elimination + childIndent hoist (commit 2) lifts that to ~1.27–1.34× min.

Test plan

  • ./mill __.reformat
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 519/519 pass
  • Scala Native A/B hyperfine — 4 interleaved-order passes, all positive; output byte-identical

Rebased onto current master (fc292fa6). The companion commit "speed up manifest JSON rendering" was merged separately as #879, so this PR now contains only the TomlRenderer / ManifestModule changes.

@He-Pin He-Pin marked this pull request as draft May 30, 2026 12:48
stephenamar-db pushed a commit that referenced this pull request Jun 2, 2026
## Motivation

`std.manifestJson`, `std.manifestJsonMinified`, and `std.manifestJsonEx`
routed through `java.io.StringWriter`, paying `StringBuffer`
synchronization per `write`/`flush` on the hot manifestation path.
Source-built jrsonnet comparisons showed sjsonnet trailing on
object-heavy manifest workloads.

## Modification

- Add `StringBuilderWriter`: an unsynchronized `Writer` over a
`StringBuilder`.
- Add package-private `FastMaterializeJsonRenderer` backed by
`StringBuilderWriter`; route the three `std.manifestJson*` builtins
through it. Public `MaterializeJsonRenderer` ABI/shape unchanged.
- Use an in-place codepoint sort for `sortedVisibleKeyNames` /
`maybeSortKeys` (avoids `.sorted` boxing).
- Fix codepoint comparison for raw surrogate prefixes;
`UnicodeHandlingTests` extended.

## Result

Scala Native hyperfine on kube-prometheus, jrsonnet HEAD `2d7eed05`:

| Workload (native) | Before | After | Δ |
|---|---:|---:|---:|
| kube-prometheus, sjsonnet | 158.4 ± 16.8 ms | 143.7 ± 3.2 ms |
**−9.3%** |
| `manifestJsonEx`, sjsonnet | — | 5.09 ± 1.01 ms | new |

## Test plan

- [x] `./mill __.reformat`
- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 518/518 pass

This PR is the base for the stacked follow-ups #875 (TomlRenderer reuses
`StringBuilderWriter`), and the independent #876/#877/#878.
@He-Pin He-Pin marked this pull request as ready for review June 3, 2026 02:57
std.manifestTomlEx routed through java.io.StringWriter, whose backing
StringBuffer pays a monitor enter/exit on every write/flush on the hot TOML
manifestation path. Switch TomlRenderer and the manifestTomlEx render path in
ManifestModule to the unsynchronized package-private StringBuilderWriter (the
same writer the JSON manifest renderer uses). Output is byte-identical;
std.deepJoin keeps StringWriter (separate concern).

Result (Scala Native hyperfine, TOML-heavy workload, ~1.8 MB output):
after ran 1.11 ± 0.07x faster than before (~10%); output byte-identical.
@He-Pin He-Pin force-pushed the perf/toml-stringbuilder-writer branch from 933ed41 to e327ba2 Compare June 3, 2026 03:52
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Jun 3, 2026

@stephenamar-db rebased

@He-Pin He-Pin marked this pull request as draft June 3, 2026 04:35
…leInternal

Each TOML table iteration was doing redundant work for every key:

  * v.value(k) was called twice — once to classify scalar vs section, then again
    to render or recurse. The cache deduplicates the result but the lookup itself
    still costs.
  * visibleKeyNames was iterated and each key binary-searched back into
    sortedVisibleKeyNames. Iterating sortedVisibleKeyNames directly is simpler
    and skips O(n log n) compares per table.
  * childIndent (cumulatedIndent + indent) was allocated inside the section loop
    once per section, all producing the same String for sibling sections.

Also pre-size the output StringBuilderWriter to 1 KiB at the manifestTomlEx
entry point so small/medium outputs skip the first few StringBuilder doublings.

Output byte-identical (no behavior change).
@He-Pin He-Pin marked this pull request as ready for review June 3, 2026 04:53
@stephenamar-db stephenamar-db merged commit 4ff8081 into databricks:master Jun 3, 2026
5 checks passed
stephenamar-db pushed a commit that referenced this pull request Jun 3, 2026
## Motivation

`std.deepJoin` writes each `Val.Str` chunk into a `java.io.StringWriter`
inside a tight loop. `StringWriter`'s backing `StringBuffer` pays a
monitor enter/exit on every `write`/`append` call, which on a typical
deepJoin walk over a deeply nested array can be hundreds of thousands of
synchronized writes — wasted overhead in single-threaded jsonnet
evaluation.

`TomlRenderer` and `FastMaterializeJsonRenderer` already use the
unsynchronized package-private `StringBuilderWriter` for the same reason
(#874, #875). `std.deepJoin` was explicitly left as a follow-up in
#875's description (*"std.deepJoin keeps StringWriter (separate
concern)"*) — this PR is that follow-up.

## Modification

Single change in `ManifestModule.scala`: swap the `new StringWriter()`
in `DeepJoin.evalRhs` for `new StringBuilderWriter()`. No other code
changes; output is byte-identical.

## Result

Scala Native, hyperfine A/B against `master` (`fc292fa6`). Workload: a
50,000-row array of 10 pre-allocated strings → 2 MB of `deepJoin`
output, render-dominated. Four interleaved-order passes, `--warmup 10
--min-runs 100 --shell=none`:

| pass | order | before mean | after mean | before min | after min |
**min ratio** |
|---|---|---:|---:|---:|---:|---:|
| 1 | before → after | 35.1 ± 16.5 ms | 32.2 ± 19.1 ms | 23.1 ms | 18.7
ms | **1.24×** |
| 2 | after → before | 43.7 ± 30.6 ms | 29.9 ± 25.3 ms | 25.7 ms | 20.3
ms | **1.27×** |
| 3 | before → after | 30.3 ± 8.5 ms | 29.5 ± 7.1 ms | 24.6 ms | 20.8 ms
| **1.18×** |
| 4 | after → before | 32.6 ± 7.6 ms | 28.0 ± 6.8 ms | 24.0 ms | 20.7 ms
| **1.16×** |

After is faster in every one of the 4 passes; mean is noisy on the host
but min values are tight at **1.16–1.27× faster** (best observed 18.7 vs
23.1 ms, ~19% reduction). Output byte-identical (2,000,000 bytes both
sides).

## Test plan

- [x] `./mill __.reformat`
- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 519/519 pass
- [x] Scala Native A/B hyperfine — 4 interleaved-order passes, all
positive; output byte-identical

---

> Independent of #875; can land in either order. After both land, the
`import java.io.StringWriter` in `ManifestModule.scala` can be removed
in a small cleanup.
@He-Pin He-Pin deleted the perf/toml-stringbuilder-writer branch June 3, 2026 19:25
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Jun 4, 2026
… stdlib

Motivation:
Continues the synchronous StringWriter elimination started by PR databricks#875
(TomlRenderer), databricks#889 (std.deepJoin), and the existing commit in this PR
(escapeStringJson/Python). Remaining hot paths still allocate a
java.io.StringWriter with its synchronized StringBuffer.

Modification:
- StringBuilderWriter: expose getBuilder for direct StringBuilder access
  (YamlRenderer trims trailing spaces via length/charAt/setLength).
- Renderer / PythonRenderer: default `out` constructor parameter changes
  from java.io.StringWriter to StringBuilderWriter. Format.scala's
  complex-type Renderer paths and Val.Obj.renderString automatically
  benefit via the new default.
- YamlRenderer: full visitor type-parameter migration from StringWriter
  to StringBuilderWriter; outBuffer switches from StringBuffer to
  StringBuilder (same length/setLength/charAt interface).
- PrettyYamlRenderer: default `out` parameter to StringBuilderWriter;
  also drops a dead StringWriter allocation in the quoted-string branch
  (the writer's result was never consumed — `quotedStr` is used instead).
- std.escapeStringXML: replace internal StringWriter with
  StringBuilderWriter, pre-sized to input length + 16.
- bench: MaterializerBenchmark switches its renderWith helper to
  StringBuilderWriter to match the new constructor signatures.

Result:
- Eliminates StringBuffer monitor acquisition on every char/string write
  across all remaining renderers (Yaml/PrettyYaml/Python/Json default
  paths) and std.escapeStringXML.
- Removes one dead StringWriter allocation per quoted-string in
  PrettyYamlRenderer.
- All JVM tests pass (sjsonnet.jvm[3.3.7].test).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants