Preserve trailing whitespace after the final semicolon#859
Open
Sanjays2402 wants to merge 1 commit into
Open
Conversation
parse() is meant to be lossless: joining the returned statements should
reproduce the input exactly. This held for trailing spaces/tabs after the
last ";" but not for trailing whitespace containing a newline --
str(parse("select 1;\n")) was "select 1;", dropping the "\n".
Root cause is in StatementSplitter.process(). Once a ";" arms consume_ws,
a following newline token is deliberately treated as "not whitespace" for
end-of-statement detection (so that "a;\nb;" splits into two statements),
which starts a new statement buffer for the trailing whitespace. When that
whitespace is at the very end of the input there is no following statement,
so the buffer stays all-whitespace and is discarded by the final
"not all whitespace" guard -- silently losing the exact input.
Fix: hold each completed statement back by one segment. A held statement is
emitted only once the next real (non-whitespace) token confirms a new
statement has begun -- so inter-statement newline placement is byte-for-byte
unchanged ("a;\nb;" still yields ["a;", "\nb;"]). At end of stream, any
leftover all-whitespace tokens are reattached to the held statement instead
of being dropped, making the trailing case lossless too. Whitespace-only
input still yields zero statements, and split() is unaffected because it
strips each statement.
Verified with a 200k-iteration round-trip fuzzer (str(parse(sql)) == sql):
31+ failures before, 0 after; full suite 494 passed.
Add parametrized regression test test_split_preserves_trailing_whitespace
covering "\n", "\r\n", multiple/mixed trailing whitespace, the multi-
statement case and a bare ";\n"; it fails on all 7 cases without the fix.
There was a problem hiding this comment.
Pull request overview
This PR fixes a losslessness gap in sqlparse.parse() where trailing whitespace containing newlines after the final semicolon (;) could be dropped when re-joining parsed statements, breaking the expected round-trip property for inputs like "select 1;\n".
Changes:
- Update
StatementSplitter.process()to hold back the most recently completed statement and, at end-of-stream, reattach any trailing all-whitespace tokens to it instead of discarding them. - Add a parametrized regression test ensuring
''.join(map(str, sqlparse.parse(s))) == sfor a variety of trailing-newline and multi-statement cases. - Document the behavioral fix in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
sqlparse/engine/statement_splitter.py |
Holds one completed statement back and reattaches trailing all-whitespace at EOF to preserve byte-for-byte round-trips. |
tests/test_split.py |
Adds regression coverage for trailing newline/whitespace preservation when joining parse() results. |
CHANGELOG |
Notes the bug fix and the restored str(parse(sql)) == sql behavior for inputs ending with newline after ;. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
sqlparse.parse()is meant to be lossless — joining the returnedstatements should reproduce the input exactly. That held for trailing
spaces/tabs after the final
;, but not for trailing whitespacecontaining a newline:
"".join(map(str, parse(input)))before"select 1;""select 1;""select 1;""select 1; "(space)"select 1; ""select 1; ""select 1;\n""select 1;"❌"select 1;\n"✅"select 1;\r\n""select 1;"❌"select 1;\r\n"✅"select 1;\n\n""select 1;"❌"select 1;\n\n"✅";\n"";"❌";\n"✅Trailing spaces were preserved but trailing newlines were not, which is a
surprising inconsistency for a library whose core guarantee is that parsing
never mutates the SQL.
Root cause
In
StatementSplitter.process(), once a;setsconsume_ws = True, afollowing newline token is deliberately treated as "not whitespace" for
end-of-statement detection (see the existing comment: "It will count
newline token as a non whitespace"). This is what makes
"a;\nb;"splitinto two statements.
The side effect: that newline starts a fresh statement buffer. When the
whitespace is at the very end of the input there is no following
statement, so the buffer stays all-whitespace and is discarded by the final
not all(t.is_whitespace ...)guard — silently losing the exact input.Fix
Hold each completed statement back by one segment. A held statement is
emitted only once the next real (non-whitespace) token confirms a new
statement has actually begun, so inter-statement newline placement is
byte-for-byte unchanged:
At end of stream, any leftover all-whitespace tokens are reattached to the
held statement instead of being dropped, making the trailing case lossless.
Unchanged behaviors (verified):
" ","\n") still yields zero statements.sqlparse.split()is unaffected — it.strip()s each statement anyway.Tests
Added a parametrized regression test
tests/test_split.py::test_split_preserves_trailing_whitespacecovering\n,\r\n, multiple/mixed trailing whitespace, the multi-statement case,and a bare
;\n.Proof it guards the bug (stash the source fix, keep the test):
Also verified with a 200k-iteration round-trip fuzzer asserting
str(parse(sql)) == sql(excluding whitespace-only inputs): 31+ failuresbefore, 0 after.
Verification
regressions).
ruff checkclean on both changed files.statement_splitter.py,tests/test_split.py,and a
CHANGELOGbullet.