Skip to content

code_review: tolerate double-encoded enum tool args#6141

Open
suhaibmujahid wants to merge 1 commit into
mozilla:masterfrom
suhaibmujahid:fix
Open

code_review: tolerate double-encoded enum tool args#6141
suhaibmujahid wants to merge 1 commit into
mozilla:masterfrom
suhaibmujahid:fix

Conversation

@suhaibmujahid

Copy link
Copy Markdown
Member

Models occasionally send enum tool arguments double-encoded, e.g. the literal '"exclude"' (quotes included) for the search_text/search_identifier 'tests' parameter, which failed pydantic Literal validation and crashed the tool call with a ValidationError.

Add a reusable bugbug.tools.core.validators module exposing strip_enum_quotes and a StripEnumQuotes BeforeValidator, used as Annotated[Literal[...], StripEnumQuotes] so any LLM-fed enum param can opt in. The 'tests' and 'langs' params now strip surrounding quotes/whitespace before the Literal check. The JSON schema sent to Anthropic is unchanged (still emits the enum), so strict-mode validation is preserved.

Fixes #6140

Models occasionally send enum tool arguments double-encoded, e.g. the
literal '"exclude"' (quotes included) for the search_text/search_identifier
'tests' parameter, which failed pydantic Literal validation and crashed the
tool call with a ValidationError.

Add a reusable bugbug.tools.core.validators module exposing strip_enum_quotes
and a StripEnumQuotes BeforeValidator, used as Annotated[Literal[...],
StripEnumQuotes] so any LLM-fed enum param can opt in. The 'tests' and 'langs'
params now strip surrounding quotes/whitespace before the Literal check. The
JSON schema sent to Anthropic is unchanged (still emits the enum), so
strict-mode validation is preserved.

Fixes mozilla#6140

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the code-review toolchain more tolerant of a common LLM failure mode where enum-valued tool arguments are double-encoded (e.g. the literal '"exclude"'), by stripping surrounding quotes/whitespace before Literal[...] validation. This prevents pydantic ValidationErrors while keeping the generated JSON schema enums intact for strict-mode validation.

Changes:

  • Added a reusable bugbug.tools.core.validators module with strip_enum_quotes and a StripEnumQuotes BeforeValidator.
  • Updated code-review Searchfox tool argument types (langs, tests) to opt into quote-stripping via Annotated[..., StripEnumQuotes].
  • Added unit/regression tests covering both the validator behavior and the specific search_text / search_identifier regression from #6140.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
bugbug/tools/core/validators.py Introduces a reusable BeforeValidator to unwrap double-encoded enum-like strings.
bugbug/tools/code_review/langchain_tools.py Applies the validator to Searchfox tool enum params (langs, tests) via Annotated.
tests/test_tools_core_validators.py Adds focused unit tests for quote-stripping and schema preservation.
tests/test_code_review.py Adds async regression tests ensuring tool invocation accepts double-encoded args and passes normalized values to the client.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_tools_core_validators.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValidationError: 1 validation error for search_text

3 participants