Chatbot Conversation Flow Simulator

Prototype intent routing before you commit to a platform. Define intents and trigger phrases, set a confidence threshold and an escalation rule, then fire test messages at the flow and watch which branch wins — with a score breakdown under every reply. Free, no signup, runs entirely in your browser.

Design a flow on the left, stress-test it on the right

Edit intents, trigger phrases, and the confidence threshold, then type test messages into the chat preview. Every bot reply comes with a score breakdown showing exactly why that branch fired. Runs entirely in your browser — nothing is sent to a server.

Flow editor

Confidence threshold0.45

Lower = fewer fallbacks but more wrong matches. Higher = stricter matching but more "sorry, I didn't catch that". Production NLU defaults usually sit at 0.4-0.7.

Fallback reply

Escalate after

Handoff message

Chat preview

Type a message below or tap a quick test to start.

What this simulator does (and what it deliberately doesn't)

The simulator reproduces the routing layer every chatbot builder puts between the user's message and the bot's reply: intent recognition. You define intents, each with a handful of trigger phrases and a reply. When you send a test message, the engine scores it against every intent, compares the best score to your confidence threshold, and fires one of four branches: a matched reply, a fallback, a user-requested handoff, or a forced escalation after repeated fallbacks.

The matching here is deliberately simple — token overlap with exact-phrase bonuses, computed client-side so every score is explainable. Production platforms use trained NLU models or LLM classification, which generalize far better to phrasings you never wrote down. That difference is the point: if your flow logic breaks under a primitive matcher, sharper NLU will only mask the structural problem, not fix it. The simulator tests your flow design — intent boundaries, threshold, escalation policy — not vendor accuracy.

How to read the score panel

Tap "Why this reply?" under any bot message to see the top three intent scores. A score at or above the threshold (green) is eligible to fire; the highest eligible intent wins. Three patterns worth watching for:

Two intents scoring within ~0.1 of each otheron the same message means their trigger phrases overlap. In production this becomes misrouting that's hard to debug — split the shared vocabulary or merge the intents.
A correct match barely clearing the threshold means your trigger phrases are too narrow. Add phrasings that use different vocabulary for the same need ("money back" vs "refund" vs "return this").
Off-topic messages scoring above zero everywhere is normal — that's why the threshold exists. If a clearly unrelated message fires an intent, raise the threshold before blaming the phrases.

Five flow-design rules this tool helps you test

1. The fallback is a feature, not a failure

A bot that never says "I didn't catch that" is a bot matching things it shouldn't. A useful fallback names what the bot cando and offers a path to a human — the example flow's fallback does both. What you're tuning is the fallback rate: drop the threshold until wrong matches appear, then back off one notch.

2. Always give "talk to a human" its own intent

Users who ask for a person and get another bot menu are the single biggest source of one-star feedback in production deployments. A dedicated handoffintent with phrases like "real person" and "agent" should fire even mid-flow. Try it in the preview — the example flow routes it straight to the handoff message.

3. Cap consecutive fallbacks at two

The escalation rule ("escalate after N consecutive fallbacks") protects users from the doom loop where every message lands in fallback. Two is the sane default: one fallback is a normal miss, two in a row means the bot doesn't cover this topic and a human should take over. Test it by sending two gibberish messages back to back.

4. Keep intents wide apart in vocabulary

Intent boundaries fail where vocabulary overlaps. "Shipping cost" and "order status" both attract the word "delivery" — send "delivery update" vs "delivery fee" to the example flow and watch the scores diverge on the second token. If two of your real intents share their highest-signal words, users will hit the wrong branch no matter how good the NLU is. The same boundary discipline applies to small-talk handling — greetings need either their own intent or a deliberate decision to let them fall through.

5. Test with messages you didn't write

The flow always works when its author tests it with the trigger phrases. The quick-test chips are intentionally phrased differently from the example flow's triggers ("I'd like my money back" vs "how do i get a refund") — that gap between authored phrases and real phrasings is exactly what your production NLU has to close. Borrow real wording from support tickets, not from your own head.

From prototype to production

Once the flow holds up here, the structure transfers directly to real builders: intents map to Dialogflow intents, Botpress nodes, or Manychat keyword rules; the threshold maps to the NLU confidence setting; the escalation rule maps to your handofftrigger. Two things this simulator can't model that production flows need: entity extraction (pulling the order number out of "where is order #4412") and session context (remembering that number on the next turn). Plan both before launch — our conversation flow guide covers the full design process step by step.

Related Chatbotscape tools and resources

FAQ

Is anything I type sent to a server?

No. The matcher runs entirely in your browser as client-side JavaScript. Flows and test conversations are not stored, logged, or transmitted — closing the tab discards everything. That also means there is no save feature yet; copy your intent phrases out before you leave.

Why did my message match an intent I didn't expect?

Open the score panel. Nearly always one shared token (after stemming, so "shipped" counts as "ship") is carrying the score. Either remove that word from the wrong intent's phrases, add more specific phrases to the right intent, or raise the threshold.

How accurate is this compared to real chatbot NLU?

It's deliberately weaker. Token overlap catches maybe two-thirds of phrasings a trained NLU model would catch, and it has no understanding of word meaning ("reimburse" won't match "refund" here, but would on most production platforms). Treat the simulator as a stress test for flow structure; treat vendor NLU benchmarks — which we publish in our platform reviews — as the accuracy measure.

Can I export the flow to Manychat, Botpress, or Dialogflow?

Not yet — there's no common import format across builders. The fastest manual path: each intent card becomes one intent/keyword rule in your builder, with the same phrases and reply. The structure transfers in minutes for flows of this size.

Can I embed this simulator on my site?

Yes — free. Copy the iframe snippet from the embed section below. The embedded version strips Chatbotscape navigation and keeps the simulator + attribution badge.

About this tool

Built and maintained by the Chatbotscape editorial team. We built the matcher to mirror the keyword-rule routing we encounter when testing chatbot platforms for review, and we use the simulator internally to sketch test flows before running them on real builders. The example flow ships with intentionally imperfect intent boundaries ("delivery" appears in two intents) so the failure modes described above are reproducible out of the box. Found a message that routes in a way the score panel can't explain? Email corrections@chatbotscape.com with the flow setup and we'll investigate.

Embed this tool on your site (free)

<iframe
  src="https://chatbotscape.com/embed/tools/chatbot-conversation-flow-sim/"
  width="100%" height="1100" frameborder="0"
  title="Chatbot Conversation Flow Simulator by Chatbotscape"
  loading="lazy">
</iframe>