Evaluation: Fix LLM scoring issues

**Is your feature request related to a problem?**  
The topic relevance LLM validator incorrectly scores user queries in pure Hindi (Devanagari script) due to mixed scripts in the message. This leads to inconsistent scoring, causing confusion in the model's assessments.

**Describe the solution you'd like**  
- Move scoring instructions to the system prompt  
- Keep user messages as raw queries only  

<details><summary>Original issue</summary>

**Describe the bug**
A clear and concise description of what the bug is.
The topic relevance LLM validator returns incorrect scores for user queries written in pure Hindi (Devanagari script). The same query written in romanized Hindi (Latin script) scores correctly.

The root cause is that scoring instructions were appended to the user message rather than placed in the system prompt. This caused the user message to be a mix of Devanagari script (the query) followed by English scoring rules — a script boundary within the same message that confuses the model. For romanized Hindi there is no script boundary, so it works correctly. The fix moves scoring instructions into the system prompt, keeping the user message as the raw query only.

**To Reproduce**
Steps to reproduce the behavior:

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.


</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Fix LLM scoring issues #130

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluation: Fix LLM scoring issues #130

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions