code_review: tolerate double-encoded enum tool args#6141
Open
suhaibmujahid wants to merge 1 commit into
Open
Conversation
Models occasionally send enum tool arguments double-encoded, e.g. the literal '"exclude"' (quotes included) for the search_text/search_identifier 'tests' parameter, which failed pydantic Literal validation and crashed the tool call with a ValidationError. Add a reusable bugbug.tools.core.validators module exposing strip_enum_quotes and a StripEnumQuotes BeforeValidator, used as Annotated[Literal[...], StripEnumQuotes] so any LLM-fed enum param can opt in. The 'tests' and 'langs' params now strip surrounding quotes/whitespace before the Literal check. The JSON schema sent to Anthropic is unchanged (still emits the enum), so strict-mode validation is preserved. Fixes mozilla#6140
There was a problem hiding this comment.
Pull request overview
This PR makes the code-review toolchain more tolerant of a common LLM failure mode where enum-valued tool arguments are double-encoded (e.g. the literal '"exclude"'), by stripping surrounding quotes/whitespace before Literal[...] validation. This prevents pydantic ValidationErrors while keeping the generated JSON schema enums intact for strict-mode validation.
Changes:
- Added a reusable
bugbug.tools.core.validatorsmodule withstrip_enum_quotesand aStripEnumQuotesBeforeValidator. - Updated code-review Searchfox tool argument types (
langs,tests) to opt into quote-stripping viaAnnotated[..., StripEnumQuotes]. - Added unit/regression tests covering both the validator behavior and the specific
search_text/search_identifierregression from #6140.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
bugbug/tools/core/validators.py |
Introduces a reusable BeforeValidator to unwrap double-encoded enum-like strings. |
bugbug/tools/code_review/langchain_tools.py |
Applies the validator to Searchfox tool enum params (langs, tests) via Annotated. |
tests/test_tools_core_validators.py |
Adds focused unit tests for quote-stripping and schema preservation. |
tests/test_code_review.py |
Adds async regression tests ensuring tool invocation accepts double-encoded args and passes normalized values to the client. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
marco-c
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Models occasionally send enum tool arguments double-encoded, e.g. the literal '"exclude"' (quotes included) for the search_text/search_identifier 'tests' parameter, which failed pydantic Literal validation and crashed the tool call with a ValidationError.
Add a reusable bugbug.tools.core.validators module exposing strip_enum_quotes and a StripEnumQuotes BeforeValidator, used as Annotated[Literal[...], StripEnumQuotes] so any LLM-fed enum param can opt in. The 'tests' and 'langs' params now strip surrounding quotes/whitespace before the Literal check. The JSON schema sent to Anthropic is unchanged (still emits the enum), so strict-mode validation is preserved.
Fixes #6140