Skip to content

Client response muxer#2

Open
film42 wants to merge 12 commits into
masterfrom
gt/muxer
Open

Client response muxer#2
film42 wants to merge 12 commits into
masterfrom
gt/muxer

Conversation

@film42

@film42 film42 commented Jan 5, 2023

Copy link
Copy Markdown
Contributor

This adds a single subscription response router. This means all client RPC requests will share one single subscription. This works by subscribing to <base-uuid>.*. Every time we send an RPC request, we store a new random <request-uuid> in a map, and set the reply inbox as <base-uuid>.<request-uuid>. Now, a single thread can pull messages from the wildcard subscription and signal to other waiting threads that their message is ready to be consumed.

There is a bit more plumbing required to make this work, but if this work successfully and fast on both jruby and MRI, we can remove our jruby specific jnats client and the custom subscription pool stuff that we added a few years back. That would be a huge win.

Old jruby client single threaded test:

$ PB_NATS_CLIENT_SUBSCRIPTION_POOL_SIZE=512 bx ruby -I lib bench/real_client.rb           
I, [2023-01-05T07:50:13.022476 #57405]  INFO -- : Using Protobuf::Nats::JNats to connect
Warming up --------------------------------------
single threaded performance
                         5.000  i/100ms
Calculating -------------------------------------
single threaded performance
                         26.230  (±118.2%) i/s -    110.000  in  10.732526s

New muxed client on jruby (not using jnats):

$ PB_NATS_DISABLE_JNATS=1 bx ruby -I lib bench/real_client.rb
I, [2023-01-05T07:49:17.535747 #57225]  INFO -- : Using NATS::Client to connect
Warming up --------------------------------------
single threaded performance
                        66.000  i/100ms
Calculating -------------------------------------
single threaded performance
                          3.101k (±13.4%) i/s -     30.294k in  10.007279s

I'm not entirely sure where the magical speedup is coming from at this point. I guess it's good to know that things appear to be working well on jruby without jnats.

MRI:

Old version:

$ bx ruby -I lib bench/real_client.rb             
I, [2023-01-05T07:54:14.300803 #57856]  INFO -- : Using NATS::IO::Client to connect
Warming up --------------------------------------
single threaded performance
                       304.000  i/100ms
Calculating -------------------------------------
single threaded performance
                          3.232k (± 7.8%) i/s -     32.224k in  10.074739s

New version:

$ bx ruby -I lib bench/real_client.rb
I, [2023-01-05T07:54:46.990533 #57930]  INFO -- : Using NATS::Client to connect
Warming up --------------------------------------
single threaded performance
                       294.000  i/100ms
Calculating -------------------------------------
single threaded performance
                          3.128k (±10.0%) i/s -     30.870k in  10.037075s

New version is reporting a little slower, but I don't think it's a substantial impact. I guess I'm more confused by the bench results than anything.

Few other things:

  1. The test suite is mocked out (my fault), so there are some changes to make it work for MRI and jruby.
  2. Added a new platform.rb file to make it easier to detect jruby in a single place (DRY) but also to disable the jnats loader (using the upgraded pure ruby client instead). By default there are no breaking changes.

@film42

film42 commented Jan 6, 2023

Copy link
Copy Markdown
Contributor Author

Released as v0.12.0.pre0 for beta testing.

@skunkworker skunkworker mentioned this pull request Jun 1, 2026
skunkworker added a commit to skunkworker/protobuf-nats that referenced this pull request Jun 23, 2026
Server-side follow-up to the muxer work (same nats-pure-era weak spots),
with a benchmark proving the wins.

#1 Fan out request intake (super_subscription_manager.rb)
- The shared intake queue was drained by ONE thread that also published every
  ACK/NACK, so on JRuby intake was pinned to a core and one slow publish (e.g.
  nats-pure's buffer during a reconnect) head-of-line blocked every subject.
- Drain with PB_NATS_SERVER_SUBSCRIPTION_HANDLERS threads (default
  processor_count on JRuby, 1 on CRuby), each with a per-thread self-healing
  backoff counter (a shared one would lose updates, like the muxer bug).
- N-poison-pill shutdown; guard @subscriptions with a mutex.
- NATS queue-group semantics and subscription counts are unchanged: each
  request is still delivered to exactly one consumer (SizedQueue#pop is atomic).
- Bench: ~8.5x intake throughput; head-of-line stall ~505ms -> ~0.4ms (8 handlers).

mxenabled#2 Handler observability, long ops first-class (server.rb)
- Handlers are NEVER aborted; long-running operations (>= a minute) are allowed.
- Track in-flight handlers (Concurrent::Map, monotonic start times) and emit:
  inflight_count, inflight_oldest_age_ms, overdue_handler_count, handler_overdue,
  pending_intake_queue_size, slow_handler (opt-in), thread_pool_saturated.
- "overdue" is keyed to the client's response_timeout (PB_NATS_SERVER_HANDLER_
  OVERDUE_MS, default 65s) so normal long ops aren't flagged.
- Switch server duration metrics to a monotonic clock.

Adds bench/server_intake_bench.rb and negative tests for both (head-of-line
non-blocking, N-thread shutdown, long-op-allowed/not-aborted/not-flagged,
slow/overdue thresholds, saturation). Full suite: 180 examples, 0 failures.
Docs updated (README env vars + How it works, bench/bench.md, CHANGELOG).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants