representation-engineering

Star

Here are 28 public repositories matching this topic...

vgel / repeng

Star

A library for making RepE control vectors

machine-learning transformers language-model sparse-autoencoders sae sparse-autoencoder saes representation-engineering

Updated Sep 24, 2025
Jupyter Notebook

IBM / activation-steering

Star

[ICLR 2025] General-purpose activation steering library

alignment steering refusal representation-engineering activation-steering llm-steering

Updated Sep 18, 2025
Python

chrisliu298 / awesome-representation-engineering

Star

A resource repository for representation engineering in large language models

awesome alignment llm large-language-model representation-engineering

Updated Nov 14, 2024

steering-vectors / steering-vectors

Star

Steering vectors for transformer language models in Pytorch / Huggingface

nlp ai pytorch gpt huggingface mechanistic-interpretability representation-engineering

Updated Feb 21, 2025
Python

AISmithLab / CoBRA

Star

[🏆 CHI26 Best Paper] CoBRA: Reproducible control of LLM agent behavior via classic social science experiments

reproducible-research agent-based-modeling hci representation-learning social-simulation social-science ai-tools ai-agent llm-agent representation-engineering llm-behavior

Updated Apr 21, 2026
Python

MaxBelitsky / cache-steering

Star

KV Cache Steering for Inducing Reasoning in Small Language Models

reasoning kv-cache large-language-models llm representation-engineering activation-steering reasoning-language-models cache-steering

Updated Jul 24, 2025
Python

krnel-ai / krnel-graph

Star

Lightweight representation engineering dataflow operations for agent developers.

transformers pytorch dataflow parquet huggingface huggingface-transformers duckdb pylance mechanistic-interpretability lancedb transformerlens representation-engineering pragmatic-interpretability

Updated May 27, 2026
Python

a9lim / saklas

Star

Activation steering and trait monitoring for HuggingFace transformers

python ai transformers interpretability huggingface llm representation-engineering activation-steering

Updated May 31, 2026
Python

erfanshayegani / Multimodal-Alignment-BlindSpots

Star

[🔥 ICLR 2026] - Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots

alignment ood interpretability post-training multimodal chat-template vision-language-model representation-engineering

Updated Mar 16, 2026
Jupyter Notebook

GinoShun / Accent-Activation-Steering

Star

Official code for "Activation Steering for Accent Adaptation in Speech Foundation Models" (Interspeech 2026). Parameter-free accent adaptation via mean-shift steering vectors — no weight updates, consistent WER reductions across 8 accents.

speech-recognition whisper asr interspeech accent-adaptation representation-engineering activation-steering qwen2-audio

Updated Mar 17, 2026
Python

Pomilon-Intelligence-Lab / CRSM

Star

CRSM (Continuous Reasoning State Model): An asynchronous "System 2" architecture that implements Hierarchical State Sovereignty within a Mamba backbone. Unlike traditional search wrappers, CRSM uses Forward-Projected Planning and Sparse-Gated Injection to steer latent manifolds in real-time, decoupling strategic reasoning from token generation.

research reinforcement-learning ai pytorch mcts rl control-theory ssm mamba reasoning state-space-models system-2 large-language-models llm brain-inspired-ai representation-engineering asynchronous-systems

Updated Apr 24, 2026
Python

oleksandr-shyshchuk / tool-probe

Star

Pre-generation tool-call gating via linear probes on LLM hidden states. F1 ≈ 0.91–0.94 on BFCL v4, 14–22× faster than full generation. Cross-architecture transfer across Llama / Qwen / Phi / Mistral (3B–7B) with ≥96% retention.

reproducible-research ai-safety hidden-states interpretability probing tool-use llm mechanistic-interpretability function-calling activation-patching representation-engineering bfcl linear-probe

Updated May 8, 2026
Jupyter Notebook

levashi / reprobe

Star

Phase-aware LLM activation steering and linear probing. A memory-efficient, practical implementation of Representation Engineering (RepE) for safety research.

transformers pytorch ai-safety mechanistic-interpretability llm-safety representation-engineering activation-steering linear-probes

Updated Apr 1, 2026
Python

Pomilon-Intelligence-Lab / ALSI

Star

Early baby steps towards a long-term vision regarding Mamba-2's state interpretability.

Updated Feb 4, 2026
Python

memo-ozdincer / RRFA

Star

Representation Rerouting for Agentic Safety: Defending LLM Agents against Prompt Injection via Circuit Breakers and Triplet Loss.

circuit-breakers safety lora triplet-loss adversarial-robustness prompt-injection llm-agents representation-engineering

Updated Apr 9, 2026
Python

tobs-code / llm-control-circuits

Star

Mechanistic interpretability experiments on political control circuits, refusal behavior, concept steering, and late-decoder interactions in open LLMs.

deep-learning transformers alignment censorship interpretability llm mechanistic-interpretability qwen representation-engineering steering-vectors

Updated May 16, 2026
Python

dimagoodlookingagent / paper1-emotion-steering

Star

Code, vectors, and figures for the paper 'Emotion and authorization steering both move cheat; trained-probe suppression doesn't undo it: a mechanistic study in Gemma-2-2B'

gemma ai-safety ai-alignment llm mechanistic-interpretability representation-engineering activation-steering

Updated May 26, 2026
HTML

VicBa2000 / pathos-engine

Star

Functional emotional architecture for LLMs — 42 systems, 1994 tests, 27 psychological theories. Emergent emotions via 7 ANIMA pillars: predictive processing, global workspace, autobiographical memory, ontogenic development, motivational drives, emotional discovery, computational phenomenology.

react python typescript emotion emotions affective-computing fastapi psicology emotional-ai llm big-five representation-engineering steering-vectors appraisal-theory

Updated May 26, 2026
Python

0SxD / ce-model-native-skills-bundle

Star

llm representation-engineering agent-skills research-tooling steering-vectors

Updated May 2, 2026

isaac-6 / geometric-latent-biopsy

Star

LatentBiopsy: Geometric Anomaly Detection for LLM Residual Streams.

ai-safety jailbreak-detection llm mechanistic-interpretability representation-engineering

Updated Apr 2, 2026
Python

Improve this page

Add a description, image, and links to the representation-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the representation-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

representation-engineering

Here are 28 public repositories matching this topic...

vgel / repeng

IBM / activation-steering

chrisliu298 / awesome-representation-engineering

steering-vectors / steering-vectors

AISmithLab / CoBRA

MaxBelitsky / cache-steering

krnel-ai / krnel-graph

a9lim / saklas

erfanshayegani / Multimodal-Alignment-BlindSpots

GinoShun / Accent-Activation-Steering

Pomilon-Intelligence-Lab / CRSM

oleksandr-shyshchuk / tool-probe

levashi / reprobe

Pomilon-Intelligence-Lab / ALSI

memo-ozdincer / RRFA

tobs-code / llm-control-circuits

dimagoodlookingagent / paper1-emotion-steering

VicBa2000 / pathos-engine

0SxD / ce-model-native-skills-bundle

isaac-6 / geometric-latent-biopsy

Improve this page

Add this topic to your repo