Question 1

We already use LangSmith or Langfuse. Why do we need Bento?

Accepted Answer

Trace tools tell you what happened. Bento closes the loop. LangSmith and Langfuse stop at observability. Bento takes the next steps: it reads your production traces, distils failure patterns into reusable skills, and injects them into the next run. That human review step — reading traces, spotting patterns, writing patches — is what Bento replaces.

Question 2

Do we need to rewrite our agent to use Bento?

Accepted Answer

No rewrites. Drop in the OTel SDK and you're instrumented. Most teams are capturing full-fidelity traces in under 30 minutes. Nudges, skills, and evaluation are additive — you opt in feature by feature, and you can run Bento alongside your existing observability stack.

Question 3

We built a custom agent framework. Will Bento work with it?

Accepted Answer

Yes. Bento is framework-agnostic. It doesn't care whether you're using LangChain, LlamaIndex, the Vercel AI SDK, AutoGen, a custom framework, or raw provider calls. If your agent takes a task and returns a result, Bento can wrap it.

Question 4

What happens when a new model drops? Do we lose our improvements?

Accepted Answer

No. Bento's output is readable files, not model weights. Skills and rules live in your repo as Markdown and YAML. They're model-agnostic, so when you upgrade your model, your accumulated knowledge transfers automatically. No retraining, no rebuild.

Question 5

How is this different from Datadog, New Relic, or our existing APM?

Accepted Answer

APMs speak HTTP and databases. Bento speaks agent. Traditional APMs don't understand tool calls, token spend, prompt content, or decision branches. Bento is OpenTelemetry-native but built around agent semantics — so you see cost per run, tool failure rates, drift over prompt versions, and regressions across deploys out of the box.

Question 6

What does Bento capture out of the box?

Accepted Answer

Everything that matters for a run. Every span, tool call, token, and outcome. Full-fidelity traces across your stack, cost and latency per node, structured errors, user feedback signals, and rule-based alerts defined in plain natural language. Retention is configurable — 30 days is the default, longer on request.

Question 7

Can we alert on custom signals — cost, drift, or errors?

Accepted Answer

Yes. Alerts are defined in plain natural language. Examples: "error_rate above 5% for 5 minutes", "cost per run above $0.30 on the checkout agent", or "tool call to stripe.refund returning 4xx". Rules compile to trace queries and route to Slack, PagerDuty, email, or webhook.

Question 8

How is this different from prompt optimisers like DSPy or OPRO?

Accepted Answer

Prompt optimisers tune wording. Bento improves full agent behaviour — tool selection, routing rules, decision checklists, mid-run interventions. Prompt wording is roughly 20% of why agents fail in production. Bento addresses the other 80%.

Question 9

How do nudges actually get created?

Accepted Answer

Automatically from production failures. Candidate nudges are authored by the engine and promoted by evidence. When Bento detects a recurring failure pattern across traces, it drafts a trigger-based nudge that fires on the matching condition. The nudge ships in shadow mode, is scored against the outcome it was meant to fix, and graduates to 100% traffic only if it provably helps.

Question 10

What if a nudge makes things worse?

Accepted Answer

It gets demoted. Every nudge is scored against production outcomes. Shadow mode catches regressions before promotion. Post-promotion, nudges that stop improving outcomes are archived automatically. Every artifact is a readable file you can inspect, edit, or revert in git.

Question 11

Do we need a training set or labelled data?

Accepted Answer

No. Your production traces are the dataset. Bento learns from the runs you already have. Labelled eval sets are useful if you want tighter guardrails during evaluation, but they're not required for the improvement loop to work.

Question 12

How do evaluations run — offline, in CI, or in production?

Accepted Answer

All three. Every prompt, model, or skill change is compared against your production history before it ships. Bento runs evals in CI against curated production datasets, supports canary rollouts for risky changes, and continuously scores live runs once they land. You see regressions before your users do.

Question 13

What evaluators does Bento support?

Accepted Answer

Code, LLM-as-judge, and human review, in parallel. Any evaluator type on any release. Deterministic code checks, LLM-judge rubrics, regex/JSON validators, and human-in-the-loop review all run concurrently against the same release. Results aggregate into a single gate.

Question 14

Do we need to hand-curate eval datasets?

Accepted Answer

Not usually. Production datasets are curated automatically from your traces. Bento groups runs by outcome, intent, and user segment, and surfaces the slices worth evaluating against. You can always promote specific runs to a pinned dataset for regression guarantees.

Question 15

How does Bento handle sensitive data in production traces?

Accepted Answer

PII redaction happens at capture time, in your infrastructure. Nothing sensitive leaves your VPC unless you want it to. Our OpenTelemetry collector ships with configurable scrubbing rules for names, emails, tokens, and custom patterns. Self-hosted deployments are available for regulated environments.

Question 16

Can we self-host Bento?

Accepted Answer

Yes. Fully self-hosted deployments are available on request. The collector, storage, and learning engine can run inside your VPC or on-prem. We support AWS, GCP, and Azure, and can sit behind your existing auth (SAML, OIDC, or SSO).

Question 17

Is Bento SOC 2 compliant?

Accepted Answer

SOC 2 Type II, in progress. GDPR-ready today, with DPAs available. We follow SOC 2 controls as of day one and are in the middle of our Type II audit. Enterprise customers get DPAs, sub-processor lists, and penetration test reports on request.

Question 18

How is pricing structured?

Accepted Answer

Usage-based, with a flat platform fee. You pay for traces ingested and skills promoted, not seats. There's a free tier for individual developers and a startup tier for small teams. Enterprise pricing scales with trace volume and includes dedicated support. No per-seat surprises.

Question 19

How long does onboarding take?

Accepted Answer

Minutes for instrumentation, a week for full value. First traces in production within 30 minutes; first promoted skill within seven days. Teams typically land the OTel SDK on day one, configure their first alerts in the same week, and see the improvement loop contribute measurable gains by the end of the first sprint.

Question 20

What do we need to get started?

Accepted Answer

An agent in production and an engineer for an hour. No ML team required. Most integrations are a few lines of code. Bento works with any callable agent, and anyone on your team who can write a YAML file can author skills and rules.

Frequently asked questions.

PLATFORM