Bug report
Bug description:
asyncio.protocols._feed_data_to_buffered_proto feeds incoming data into a BufferedProtocol reusable buffer. The current multi-iteration path is O(N^2) because it slices data as bytes on every iteration:
# Lib/asyncio/protocols.py:200-216
def _feed_data_to_buffered_proto(proto, data):
data_len = len(data)
while data_len:
buf = proto.get_buffer(data_len)
buf_len = len(buf)
if not buf_len:
raise RuntimeError('get_buffer() returned an empty buffer')
if buf_len >= data_len:
buf[:data_len] = data
proto.buffer_updated(data_len)
return
else:
buf[:buf_len] = data[:buf_len]
proto.buffer_updated(buf_len)
data = data[buf_len:]
data_len = len(data)
data = data[buf_len:] executes N / buf_len times, each time recopying the remaining tail.
Proposed fix
Track the offset into data manually instead of reslicing:
def _feed_data_to_buffered_proto(proto, data):
data_len = len(data)
start = 0
while data_len:
buf = proto.get_buffer(data_len)
buf_len = len(buf)
if not buf_len:
raise RuntimeError('get_buffer() returned an empty buffer')
if buf_len >= data_len:
buf[:data_len] = data[start:start + data_len] if start else data
proto.buffer_updated(data_len)
return
else:
buf[:buf_len] = data[start:start + buf_len]
proto.buffer_updated(buf_len)
start += buf_len
data_len -= buf_len
Benchmarks
Benchmarks made via pyperf
+-------------------------+---------+-------------------------+
| Benchmark | old | new |
+=========================+=========+=========================+
| data= 65536 buf= 4096 | 48.7 us | 16.8 us: 2.89x faster |
+-------------------------+---------+-------------------------+
| data= 262144 buf= 4096 | 637 us | 67.4 us: 9.45x faster |
+-------------------------+---------+-------------------------+
| data= 1048576 buf= 4096 | 8.28 ms | 269 us: 30.77x faster |
+-------------------------+---------+-------------------------+
| data= 4194304 buf= 4096 | 205 ms | 1.08 ms: 189.36x faster |
+-------------------------+---------+-------------------------+
| data= 1048576 buf=65536 | 741 us | 237 us: 3.13x faster |
+-------------------------+---------+-------------------------+
| data= 100 buf= 4096 | 558 ns | 571 ns: 1.02x slower |
+-------------------------+---------+-------------------------+
| data= 1024 buf= 4096 | 636 ns | 641 ns: 1.01x slower |
+-------------------------+---------+-------------------------+
| data= 4096 buf= 4096 | 812 ns | 823 ns: 1.01x slower |
+-------------------------+---------+-------------------------+
| data= 32768 buf=65536 | 2.53 us | 2.55 us: 1.01x slower |
+-------------------------+---------+-------------------------+
| Geometric mean | (ref) | 4.27x faster |
+-------------------------+---------+-------------------------+
The scenarios when buf_len >= data_len are not regressed. The 1.01-1.02x differences can be treated as statistical noise
Full benchmark code I'll pin in the PR
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows, macOS
Linked PRs
Bug report
Bug description:
asyncio.protocols._feed_data_to_buffered_protofeeds incoming data into aBufferedProtocolreusable buffer. The current multi-iteration path is O(N^2) because it slicesdataasbyteson every iteration:data = data[buf_len:]executesN / buf_lentimes, each time recopying the remaining tail.Proposed fix
Track the offset into
datamanually instead of reslicing:Benchmarks
Benchmarks made via pyperf
The scenarios when
buf_len >= data_lenare not regressed. The 1.01-1.02x differences can be treated as statistical noiseFull benchmark code I'll pin in the PR
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows, macOS
Linked PRs
asyncio.protocols._feed_data_to_buffered_proto#150622