Goal
Measure baseline scorer behavior against reviewer labels.
Acceptance criteria
- Human label fixture format is documented.
- Evaluation report includes agreement metrics and per-dimension errors.
- The harness can run locally without external model credentials.
- Results clearly mark the baseline as unvalidated.
Goal
Measure baseline scorer behavior against reviewer labels.
Acceptance criteria