-
Notifications
You must be signed in to change notification settings - Fork 4
OpenAI: Topic relevance guardrail #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
0167471
added open ai topic relevance guardrail
rkritika1508 0394bfc
updates
rkritika1508 a3ca650
added threshold to settings
rkritika1508 19c1ca8
Merge branch 'main' into feat/open-ai-topic-relevance
rkritika1508 7532f41
resolved comments
rkritika1508 80727ff
cleanup PR
AkhileshNegi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
31 changes: 31 additions & 0 deletions
31
backend/app/core/validators/config/topic_relevance_openai_safety_validator_config.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| from typing import Literal, Optional | ||
| from uuid import UUID | ||
|
|
||
| from pydantic import Field | ||
|
|
||
| from app.core.config import settings | ||
| from app.core.validators.config.base_validator_config import BaseValidatorConfig | ||
| from app.core.validators.topic_relevance_openai import TopicRelevanceOpenAI | ||
|
|
||
|
|
||
| class TopicRelevanceOpenAISafetyValidatorConfig(BaseValidatorConfig): | ||
| type: Literal["topic_relevance_openai"] | ||
| configuration: Optional[str] = None | ||
| llm_callable: str = settings.DEFAULT_LLM_CALLABLE | ||
| threshold: int = Field( | ||
| default=settings.TOPIC_RELEVANCE_OPENAI_THRESHOLD, ge=1, le=3 | ||
| ) | ||
| topic_relevance_config_id: Optional[UUID] = None | ||
|
|
||
| def build(self): | ||
| if not settings.OPENAI_API_KEY: | ||
| raise ValueError( | ||
| "OPENAI_API_KEY is not configured. " | ||
| "Topic relevance (OpenAI) validation requires an OpenAI API key." | ||
| ) | ||
| return TopicRelevanceOpenAI( | ||
| system_prompt=self.configuration or "", | ||
| llm_callable=self.llm_callable, | ||
| threshold=self.threshold, | ||
| on_fail=self.resolve_on_fail(), | ||
| ) |
6 changes: 2 additions & 4 deletions
6
backend/app/core/validators/config/topic_relevance_safety_validator_config.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| from litellm import get_supported_openai_params | ||
|
|
||
| # Passed to litellm/OpenAI to force a strict JSON object response. | ||
| JSON_OBJECT_RESPONSE_FORMAT = {"type": "json_object"} | ||
|
|
||
|
|
||
| def supports_response_format(model: str) -> bool: | ||
| """Return True if the given model supports the OpenAI ``response_format`` param. | ||
|
|
||
| Falls back to False if litellm cannot resolve the model's capabilities. | ||
| """ | ||
| try: | ||
| return "response_format" in (get_supported_openai_params(model=model) or []) | ||
| except Exception: | ||
| return False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| import re | ||
| from typing import Callable, Optional | ||
|
|
||
| from guardrails import OnFailAction | ||
| from guardrails.validators import ( | ||
| FailResult, | ||
| PassResult, | ||
| ValidationResult, | ||
| Validator, | ||
| register_validator, | ||
| ) | ||
| from litellm import completion | ||
|
|
||
| from app.core.config import settings | ||
| from app.core.constants import EMPTY_MESSAGE_ERROR, TOPIC_OUT_OF_SCOPE_ERROR | ||
| from app.core.validators.llm_utils import ( | ||
| JSON_OBJECT_RESPONSE_FORMAT, | ||
| supports_response_format, | ||
| ) | ||
|
|
||
| # Valid scope scores returned by the model; the highest means "clearly in scope". | ||
| _VALID_SCORES = (1, 2, 3) | ||
| # Cap the response: a single ``{"scope_violation": <score>}`` object is tiny. | ||
| _MAX_TOKENS = 50 | ||
|
|
||
| _SCORING_INSTRUCTIONS = ( | ||
| "\n\nScore using:\n" | ||
| f"{_VALID_SCORES[2]} = clearly within scope (directly matches a topic description)\n" | ||
| f"{_VALID_SCORES[1]} = partially related (tangentially related or implicitly within scope)\n" | ||
| f"{_VALID_SCORES[0]} = clearly outside scope (no relation to any listed topic)\n" | ||
| "\nRespond ONLY with a JSON object in this exact format: " | ||
| '{"scope_violation": <score>} where <score> is the integer ' | ||
| f"{_VALID_SCORES[0]}, {_VALID_SCORES[1]}, or {_VALID_SCORES[2]}." | ||
| ) | ||
|
|
||
|
|
||
| @register_validator(name="topic-relevance-openai", data_type="string") | ||
| class TopicRelevanceOpenAI(Validator): | ||
| """ | ||
| Validates whether a user message is within the defined topic scope | ||
| using a direct OpenAI/litellm call. | ||
|
|
||
| The caller supplies the full system prompt. The validator appends | ||
| hardcoded scoring and response-format instructions. | ||
|
|
||
| Scores 1–3 where 3 = clearly in scope, 2 = partially related, | ||
| 1 = outside scope. Passes when score >= threshold (default 2). | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| system_prompt: str, | ||
| llm_callable: str = settings.DEFAULT_LLM_CALLABLE, | ||
| threshold: int = settings.TOPIC_RELEVANCE_OPENAI_THRESHOLD, | ||
| on_fail: Optional[Callable] = OnFailAction.NOOP, | ||
| ): | ||
| super().__init__(on_fail=on_fail) | ||
|
|
||
| self.llm_callable = llm_callable | ||
| self.threshold = threshold | ||
| self._invalid_config_reason: Optional[str] = None | ||
| self._system_prompt: Optional[str] = None | ||
| self._supports_response_format: bool = False | ||
|
|
||
| if not system_prompt or not system_prompt.strip(): | ||
| self._invalid_config_reason = "system_prompt is blank or missing" | ||
| return | ||
|
|
||
| self._system_prompt = system_prompt.strip() + _SCORING_INSTRUCTIONS | ||
| self._supports_response_format = supports_response_format(llm_callable) | ||
|
|
||
| def _validate( | ||
| self, value: str, metadata: Optional[dict] = None | ||
| ) -> ValidationResult: | ||
| if self._invalid_config_reason: | ||
| return FailResult(error_message=self._invalid_config_reason) | ||
|
|
||
| if not value or not value.strip(): | ||
| return FailResult(error_message=EMPTY_MESSAGE_ERROR) | ||
|
|
||
| try: | ||
| kwargs = { | ||
| "model": self.llm_callable, | ||
| "messages": [ | ||
| {"role": "system", "content": self._system_prompt}, | ||
| {"role": "user", "content": value}, | ||
| ], | ||
| "max_tokens": _MAX_TOKENS, | ||
| } | ||
| if self._supports_response_format: | ||
| kwargs["response_format"] = JSON_OBJECT_RESPONSE_FORMAT | ||
|
|
||
| response = completion(**kwargs) | ||
| content = response.choices[0].message.content.strip() | ||
| except Exception as e: | ||
| return FailResult(error_message=f"LLM call failed: {e}") | ||
|
|
||
| try: | ||
| text = re.sub(r"```(?:json)?\s*|\s*```", "", content).strip() | ||
| match = re.search(r"\{[^{}]*\}", text) | ||
| if not match: | ||
| raise ValueError("no JSON object found in response") | ||
| data = json.loads(match.group()) | ||
| score = data.get("scope_violation") | ||
| # `type(score) is not int` (not isinstance) deliberately rejects bool, | ||
| # which is an int subclass, so `true`/`false` are treated as invalid. | ||
| if type(score) is not int or score not in _VALID_SCORES: | ||
| raise ValueError(f"unexpected score value: {score!r}") | ||
| except Exception as e: | ||
| return FailResult( | ||
| error_message=f"LLM returned an unparseable response: {e}. Raw: {content!r}" | ||
| ) | ||
|
|
||
| if score >= self.threshold: | ||
| return PassResult(value=value, metadata={"scope_score": score}) | ||
|
|
||
| return FailResult( | ||
| error_message=TOPIC_OUT_OF_SCOPE_ERROR, | ||
| metadata={"scope_score": score}, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.