Skip to content

feat(meta): add agent-review rubric and example GitHub Action#1209

Draft
jfindlay wants to merge 1 commit into
leanprover-community:masterfrom
jfindlay:AI-Policy
Draft

feat(meta): add agent-review rubric and example GitHub Action#1209
jfindlay wants to merge 1 commit into
leanprover-community:masterfrom
jfindlay:AI-Policy

Conversation

@jfindlay

Copy link
Copy Markdown
Contributor
  • AGENTS-REVIEW.md: A vendor-neutral, rubric for a single review pass on a Physlib PR.
  • .github/workflows/agent-review.yml: An example implementation triggered by the agent-review label.

* `AGENTS-REVIEW.md`: A vendor-neutral, rubric for a single review pass
  on a Physlib PR.
* `.github/workflows/agent-review.yml`: An example implementation
  triggered by the `agent-review` label.

Co-authored-by: Claude Sonnet 4.6 <noreply+claude-sonnet@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Thank you for this PR, which will now be reviewed.
If submitting to ./Physlib or ./QuantumInfo, please
see our review guidelines
if you are not familiar with the process. You should expect a back and forth
with a reviewer before your PR is merged. See also that link for how to
add appropriate labels to your PR. The PR will also go through a number
of automated checks. You can learn more about these here,
including how to run them locally.

If you are submitting to ./PhyslibAlpha there will be a lighter review process,
though your PR must still pass the automated checks.

If you want to bring attention to this PR, please write a message on this
thread of the Lean Zulip.

AGENT_REVIEW_API_KEY: ${{ secrets.AGENT_REVIEW_API_KEY }}
# Configure the model here. Default: a mid-tier model.
# Escalate to a top-tier model only for large or complex PRs by exception.
AGENT_MODEL: "claude-sonnet-4-5"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonnet 4.5 is not good enough. I think we should use the top tier models here like opus-4.8 or gpt-5.5

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know who actually provides the agents for this? I assume we would need some AI key.

This might be a use case of #1211

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonnet 4.5 is not good enough. I think we should use the top tier models here like opus-4.8 or gpt-5.5

Sonnet 4.5 is a compromise on token cost. There are some other token optimizations in AGENTS-REVIEW.md as it's currently written, like only running the review on explicit (re)label. I agree that some aspects of the review would be better at a higher tier agent, but more mechanical tasks like assessing docs coverage could be given to a lower tier agent to save cost.

Do we know who actually provides the agents for this? I assume we would need some AI key.

Proprietary frontier agent models are less generous with their free tier than for example GitHub itself. I do not know much about this area, but since this is a high profile, important, public project, there could be a way to get a sponsorship or a discount or perhaps GitHub or some libre software consortium has a program for such.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the current PR traffic of PhysLib, I think there is no need to compromise on token cost. I also generally do not trust models weaker than Opus-4.6; they sometimes produce serious hallucinations and require a lot of engineering effort to make them work properly.

I am still reading the GitHub agentic workflows documentation. It seems like it supports AI model subscription tiers, in which case a 100-200usd/month subscription should definitely be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants