Frequently asked questions.
Everything teams ask before they ship Bento into production.
PLATFORM
We already use LangSmith or Langfuse. Why do we need Bento?
Trace tools tell you what happened. Bento closes the loop. LangSmith and Langfuse stop at observability. Bento takes the next steps: it reads your production traces, distils failure patterns into reusable skills, and injects them into the next run. That human review step — reading traces, spotting patterns, writing patches — is what Bento replaces.
Do we need to rewrite our agent to use Bento?
No rewrites. Drop in the OTel SDK and you're instrumented. Most teams are capturing full-fidelity traces in under 30 minutes. Nudges, skills, and evaluation are additive — you opt in feature by feature, and you can run Bento alongside your existing observability stack.
We built a custom agent framework. Will Bento work with it?
Yes. Bento is framework-agnostic. It doesn't care whether you're using LangChain, LlamaIndex, the Vercel AI SDK, AutoGen, a custom framework, or raw provider calls. If your agent takes a task and returns a result, Bento can wrap it.
What happens when a new model drops? Do we lose our improvements?
No. Bento's output is readable files, not model weights. Skills and rules live in your repo as Markdown and YAML. They're model-agnostic, so when you upgrade your model, your accumulated knowledge transfers automatically. No retraining, no rebuild.
MONITOR
How is this different from Datadog, New Relic, or our existing APM?
APMs speak HTTP and databases. Bento speaks agent. Traditional APMs don't understand tool calls, token spend, prompt content, or decision branches. Bento is OpenTelemetry-native but built around agent semantics — so you see cost per run, tool failure rates, drift over prompt versions, and regressions across deploys out of the box.
What does Bento capture out of the box?
Everything that matters for a run. Every span, tool call, token, and outcome. Full-fidelity traces across your stack, cost and latency per node, structured errors, user feedback signals, and rule-based alerts defined in plain natural language. Retention is configurable — 30 days is the default, longer on request.
Can we alert on custom signals — cost, drift, or errors?
Yes. Alerts are defined in plain natural language. Examples: "error_rate above 5% for 5 minutes", "cost per run above $0.30 on the checkout agent", or "tool call to stripe.refund returning 4xx". Rules compile to trace queries and route to Slack, PagerDuty, email, or webhook.
IMPROVE
How is this different from prompt optimisers like DSPy or OPRO?
Prompt optimisers tune wording. Bento improves full agent behaviour — tool selection, routing rules, decision checklists, mid-run interventions. Prompt wording is roughly 20% of why agents fail in production. Bento addresses the other 80%.
How do nudges actually get created?
Automatically from production failures. Candidate nudges are authored by the engine and promoted by evidence. When Bento detects a recurring failure pattern across traces, it drafts a trigger-based nudge that fires on the matching condition. The nudge ships in shadow mode, is scored against the outcome it was meant to fix, and graduates to 100% traffic only if it provably helps.
What if a nudge makes things worse?
It gets demoted. Every nudge is scored against production outcomes. Shadow mode catches regressions before promotion. Post-promotion, nudges that stop improving outcomes are archived automatically. Every artifact is a readable file you can inspect, edit, or revert in git.
Do we need a training set or labelled data?
No. Your production traces are the dataset. Bento learns from the runs you already have. Labelled eval sets are useful if you want tighter guardrails during evaluation, but they're not required for the improvement loop to work.
EVALUATE
How do evaluations run — offline, in CI, or in production?
All three. Every prompt, model, or skill change is compared against your production history before it ships. Bento runs evals in CI against curated production datasets, supports canary rollouts for risky changes, and continuously scores live runs once they land. You see regressions before your users do.
What evaluators does Bento support?
Code, LLM-as-judge, and human review, in parallel. Any evaluator type on any release. Deterministic code checks, LLM-judge rubrics, regex/JSON validators, and human-in-the-loop review all run concurrently against the same release. Results aggregate into a single gate.
Do we need to hand-curate eval datasets?
Not usually. Production datasets are curated automatically from your traces. Bento groups runs by outcome, intent, and user segment, and surfaces the slices worth evaluating against. You can always promote specific runs to a pinned dataset for regression guarantees.
SECURITY & DEPLOYMENT
How does Bento handle sensitive data in production traces?
PII redaction happens at capture time, in your infrastructure. Nothing sensitive leaves your VPC unless you want it to. Our OpenTelemetry collector ships with configurable scrubbing rules for names, emails, tokens, and custom patterns. Self-hosted deployments are available for regulated environments.
Can we self-host Bento?
Yes. Fully self-hosted deployments are available on request. The collector, storage, and learning engine can run inside your VPC or on-prem. We support AWS, GCP, and Azure, and can sit behind your existing auth (SAML, OIDC, or SSO).
Is Bento SOC 2 compliant?
SOC 2 Type II, in progress. GDPR-ready today, with DPAs available. We follow SOC 2 controls as of day one and are in the middle of our Type II audit. Enterprise customers get DPAs, sub-processor lists, and penetration test reports on request.
PRICING & ONBOARDING
How is pricing structured?
Usage-based, with a flat platform fee. You pay for traces ingested and skills promoted, not seats. There's a free tier for individual developers and a startup tier for small teams. Enterprise pricing scales with trace volume and includes dedicated support. No per-seat surprises.
How long does onboarding take?
Minutes for instrumentation, a week for full value. First traces in production within 30 minutes; first promoted skill within seven days. Teams typically land the OTel SDK on day one, configure their first alerts in the same week, and see the improvement loop contribute measurable gains by the end of the first sprint.
What do we need to get started?
An agent in production and an engineer for an hour. No ML team required. Most integrations are a few lines of code. Bento works with any callable agent, and anyone on your team who can write a YAML file can author skills and rules.