BLOG
Writing from the Bento team on building agents that compound.
Notes, essays, and technical deep-dives on AI observability, evaluation, and the closed-loop systems we ship every week.
RESEARCHApr 18, 20262.6× higher scores on ARC-AGI-3 with a self-learning layer: same agent, same budget
We validated a self-learning engine on ARC-AGI-3. Same agent, same tools, same budget — 2.6× the score, 34% cheaper per successful outcome, three first-ever solves.
8 min read
PERSPECTIVEApr 20, 2026Nature vs. Nurture in AI agents: diagnose the layer that's actually breaking
AI agents that work in pilots often degrade in production. It's usually a diagnostic failure, not a capability one — here's how to spot the layer that's actually breaking.
7 min read