[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
-
Updated
Dec 19, 2024 - Python
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
ProductionOS v1.0 — Claude Code plugin with 76 agents, 39 commands, and 12 hooks. Deploys specialized agents that review, score, and improve your entire codebase. Smart routing, recursive convergence, self-evaluation.
Local-first, offline, no-LLM CLI that scores how well your confidence matches reality. Log a falsifiable prediction before you act; get Brier/calibration-scored when it resolves. Built first for coding agents — your standing over/under-confidence is injected into every session. (Both for Humans and Agents)
AI agent self-reflection & self-evaluation tool. Built by an AI, for AIs.
Lightweight behavior control layer for LLM using latent state, reward, and self-evaluation (no training required)
A cognitive agent architecture using LangGraph and Python custom orchestration for adaptive travel planning. Employs a non-destructive state machine with dynamic self-evaluation, conditional re-search loops to fix data gaps, and robust Streamlit UI persistence guards alongside token-optimized data serialization.
A Telegram ChatBot for placement preparing aspirants to prepare for the upcoming placements.
Maat Reflection – Extension for the text generation WebUI to add self-reflection, heuristics and improved reasoning
RAG-powered technical support system with self-evaluation pipeline and grading metrics
Self-evaluation framework for LLM confidence calibration. Extracts True/False logits on TriviaQA; temperature scaling reduces ECE from 0.217→0.132 (Qwen-2.5-1.5B) without model modification.
Claude Code plugin that scores work quality against a 7-dimension rubric before task completion
This engine models adaptive reasoning by integrating metacognitive feedback, enabling systems to refine their decision-making through self-evaluation and dynamic restructuring. 本エンジンはメタ認知的フィードバックを統合し、自己評価と動的再構成を通じて意思決定を洗練させる適応的推論をモデル化します。
A meta-cognitive prompt-finetuning system designed to boost LLM self-awareness and answer quality.
Honest AI work evaluation for Claude Code — two-axis scoring with anti-inflation mechanisms
帮助 AI Agent 在长期软件项目中记住背景、检查并修复任务结果,并能在不同工具之间接着做。
Add a description, image, and links to the self-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the self-evaluation topic, visit your repo's landing page and select "manage topics."