self-evaluation

Here are 15 public repositories matching this topic...

Alsace08 / Chain-of-Embedding

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

interpretability trustworthy-ai large-language-models mechanistic-interpretability self-evaluation hallucination-detection iclr-2025

Updated Dec 19, 2024
Python

ProductionOS v1.0 — Claude Code plugin with 76 agents, 39 commands, and 12 hooks. Deploys specialized agents that review, score, and improve your entire codebase. Smart routing, recursive convergence, self-evaluation.

security-audit multi-agent code-review prompt-engineering self-evaluation deep-research llm-judge claude-code agentic-development claude-code-plugin recursive-improvement auto-swarm max-research production-upgrade convergence-engine worktree-isolation

Updated Apr 16, 2026
TypeScript

Anbu-00001 / Anamnesis

Star

Local-first, offline, no-LLM CLI that scores how well your confidence matches reality. Log a falsifiable prediction before you act; get Brier/calibration-scored when it resolves. Built first for coding agents — your standing over/under-confidence is injected into every session. (Both for Humans and Agents)

rust cli offline mcp calibration forecasting ai-agents metacognition local-first self-evaluation llm-agents agent-memory brier-score no-llm

Updated Jun 9, 2026
Rust

eliumusk / agentreflect

Star

AI agent self-reflection & self-evaluation tool. Built by an AI, for AIs.

ai-agent build-in-public llm autonomous-ai self-evaluation agent-evaluation ai-reflection

Updated Mar 1, 2026
Python

canhdien69-tech / Nang-AI-runtime-behavior

Star

Lightweight behavior control layer for LLM using latent state, reward, and self-evaluation (no training required)

reinforcement-learning autonomous-agents ai-agents self-evaluation llm-agent local-llm hallucination-detection behavior-system

Updated Mar 29, 2026
Python

ruthuraraj-ml / A-Memory-Aware-Agentic-Travel-Planning-System-with-Self-Evaluation-and-Conditional-Re-Search

Star

A cognitive agent architecture using LangGraph and Python custom orchestration for adaptive travel planning. Employs a non-destructive state machine with dynamic self-evaluation, conditional re-search loops to fix data gaps, and robust Streamlit UI persistence guards alongside token-optimized data serialization.

gemini groq streamlit langchain self-evaluation langgraph reflection-loop multi-tool-orchestration

Updated Jun 6, 2026
Python

AdarshMS16 / PlacementBot

Star

A Telegram ChatBot for placement preparing aspirants to prepare for the upcoming placements.

python nlp telegram telegram-bot chatbot self-learning placement-preparation self-evaluation drive-update

Updated Feb 27, 2024
Python

Chris4081 / maat-reflection

Star

Maat Reflection – Extension for the text generation WebUI to add self-reflection, heuristics and improved reasoning

research text-generation ai-safety metacognition llm text-generation-webui self-evaluation ai-reflection

Updated Oct 24, 2025
Python

smqd19 / RAG_Technical_Support

Star

RAG-powered technical support system with self-evaluation pipeline and grading metrics

python technical-support rag llm self-evaluation

Updated Apr 16, 2026
Python

YagniPatel / llm-calibration-self-evaluation

Star

Self-evaluation framework for LLM confidence calibration. Extracts True/False logits on TriviaQA; temperature scaling reduces ECE from 0.217→0.132 (Qwen-2.5-1.5B) without model modification.

nlp pytorch huggingface callibration temperature-scaling llm self-evaluation qwen smollm uncertainity-quantification

Updated May 31, 2026
Jupyter Notebook

augmentedivan / claude-self-score

Star

Claude Code plugin that scores work quality against a 7-dimension rubric before task completion

code-quality ai-quality self-evaluation llm-tools claude-code claude-code-plugin

Updated Mar 19, 2026

Mk9207 / Adaptive-Metacognitive-Inference-Engine-

Sponsor

Star

This engine models adaptive reasoning by integrating metacognitive feedback, enabling systems to refine their decision-making through self-evaluation and dynamic restructuring. 本エンジンはメタ認知的フィードバックを統合し、自己評価と動的再構成を通じて意思決定を洗練させる適応的推論をモデル化します。

intelligent-systems inference-engine metacognition decision-modeling dynamic-optimization self-evaluation ai-architecture adaptive-reasoning cognitive-feedback system-restructuring