Deliberate-then-Act: self-critique loops for high-stakes decisions
A framework where agents draft, critique, and revise a plan before committing — cutting downstream errors by 38% on operational benchmarks.
Read paperOur research lab studies how agents deliberate, use tools reliably, and stay governable as they take on real authority. Here's what we're working on and what we've published.
A framework where agents draft, critique, and revise a plan before committing — cutting downstream errors by 38% on operational benchmarks.
Read paperHow agents recover when an API times out, returns malformed data, or partially succeeds — with verification and graceful rollback.
Read paperDistillation and caching techniques that bring multi-step reasoning to 120ms median latency at the point of decision.
Read paperCalibrating when an agent should act, ask, or escalate — so authority scales with demonstrated reliability.
Read paperTraining agents to produce human-auditable rationales that hold up under scrutiny — without degrading task accuracy.
Read paper200 realistic enterprise tasks spanning CRM, ticketing, and analytics — open-sourced for the community to build on.
View benchmarkGet new papers, benchmarks, and lab notes in your inbox. No noise — just the research.
Subscribe to the lab