Research

Pushing reasoning
to the edge.

Our research lab studies how agents deliberate, use tools reliably, and stay governable as they take on real authority. Here's what we're working on and what we've published.

0
peer-reviewed papers
0%
avg. reasoning-error reduction
0
open benchmarks released
0
median edge inference
Focus areas

Four lines of inquiry

Reasoning · 2026

Deliberate-then-Act: self-critique loops for high-stakes decisions

A framework where agents draft, critique, and revise a plan before committing — cutting downstream errors by 38% on operational benchmarks.

Read paper
Tool use · 2026

Reliable function-calling under partial failure

How agents recover when an API times out, returns malformed data, or partially succeeds — with verification and graceful rollback.

Read paper
Edge inference · 2025

Sub-second reasoning on live operational streams

Distillation and caching techniques that bring multi-step reasoning to 120ms median latency at the point of decision.

Read paper
Governance · 2025

Confidence-gated autonomy

Calibrating when an agent should act, ask, or escalate — so authority scales with demonstrated reliability.

Read paper
Reasoning · 2025

Explanations as a first-class output

Training agents to produce human-auditable rationales that hold up under scrutiny — without degrading task accuracy.

Read paper
Tool use · 2025

SparkBench: an open benchmark for workflow agents

200 realistic enterprise tasks spanning CRM, ticketing, and analytics — open-sourced for the community to build on.

View benchmark

Read with us

Get new papers, benchmarks, and lab notes in your inbox. No noise — just the research.

Subscribe to the lab