Verified

Entity Extraction· NLU task

Entity extraction (also called Named Entity Recognition, or NER) is the NLU task of identifying and labeling specific data points in free-text messages — names, dates, locations, products, currencies, organizations, and custom domain-specific entities. In a chatbot, entity extraction pulls structured data out of user messages so the bot can act on it: the phrase Book me for Tuesday at 3 PM is parsed into structured fields: intent=book_appointment, day=Tuesday, time=15:00.

By Chatbotscape Editorial· Methodology· Published 26 May 2026· Updated 26 May 2026

Entity Extraction — Definition, How It Works in Chatbots (2026)

Quick answer~1 min

Entity extraction is identifying specific data points in what a user types — dates, names, products, locations. The chatbot uses these to fill in structured data and take action.

What it is

When a user types "Cancel my October 15 order for the blue dress in size M", a chatbot needs to know:

Intent: cancel_order
Date: October 15
Product: blue dress
Size: M

The dates, product, and size are entities. Entity extraction is the NLU task of finding and labeling them.

Standard entity types in most platforms:

Person — names
Location — cities, addresses, countries
Organization — company names
Date / Time — calendar references ("tomorrow", "next Tuesday", "3 PM")
Money / Number — currency amounts, quantities
Email / Phone / URL — structured strings

Plus custom entities specific to your domain — product names, SKUs, account IDs, plan tiers, etc.

How it works

Two main approaches:

1. Rule-based extraction

Regular expressions + dictionaries: a regex catches phone numbers; a dictionary of product names matches against the message text. Cheap, predictable, language-specific.

2. ML-based NER

Neural network trained on labeled examples. Modern platforms use either pretrained NER models (spaCy, Stanford NLP) or LLM-based extraction (prompt the LLM to "extract dates, products, and sizes from this message").

LLM-based extraction handles phrasing variation and multilingual input gracefully. Rule-based extraction is faster and cheaper but brittle.

Entity extraction in chatbot platforms

Dialogflow — first-class entity definitions, system entities (date, location, currency, etc.) + custom entities.
Botpress — visual entity definitions, LLM-extraction available.
Rasa, Microsoft Bot Framework — similar explicit entity approach.

Marketing-focused platforms (Manychat, Chatfuel) typically capture entities through explicit form fields or button menus rather than NER on free text. Customer-service and AI-agent platforms make heavier use of NER.

When entity extraction matters

Transactional flows. Booking, cancellation, lookup — these need structured data extracted from free text.
CRM enrichment. Pulling names, companies, emails from conversation to populate records.
Search and filtering. "Show me red shoes under $100" needs entity extraction (color, max_price) to translate to a database query.
Multi-step conversations. Slot-filling — asking the user for each missing entity until all are collected.

Common pitfalls

Ambiguous references. "Last Monday" depends on the current date. "Sales rep John" depends on which John. Production systems disambiguate carefully.
Custom entities require maintenance. Product catalogs change; entity dictionaries need updates.
Multi-language gotchas. Date formats differ (US: month/day; LATAM/EU: day/month). Currency symbols vary. Language-specific NER training data is uneven.

Worked example — slot-filling a booking flow

A salon chatbot needs to capture three entities to book an appointment: service, date, and time. Here is how the bot handles realistic user messages:

Turn	User message	Entities extracted	Bot next action
1	"Hi, I want to book a haircut for Friday at 3 PM"	service=haircut, date=29 May 2026, time=15:00	Confirm and book
2	"I need an appointment"	(none)	Ask: "What service, and when?"
3	"tomorrow at 2"	date=27 May 2026, time=14:00, service=null	Ask: "Which service?"
4	"the cheap one I had last time"	(anaphoric — needs CRM lookup)	Lookup user's history, propose match
5	"haircut next Tuesday around 3-ish"	service=haircut, date=2 June 2026, time≈15:00 (fuzzy)	Confirm with "3 PM, OK?"
6	"quero cortar cabelo amanhã às 14h" (PT-BR)	service=haircut, date=27 May 2026, time=14:00	Confirm in PT-BR

Cases 3 and 5 illustrate why fuzzy time and date handling matters in production. Case 4 shows how entity extraction alone is insufficient when references are anaphoric — the bot must combine NER with CRM lookup. Case 6 demonstrates the multilingual edge: a PT-BR-trained NER understands "amanhã" (tomorrow) and "às 14h" (at 2 PM), while an English-only NER would fail entirely.

Entity types in production

Most platforms ship a standard library of pre-built entity types you can use without custom training:

Entity type	Examples it catches	Edge cases that fail
@sys.date	"tomorrow", "next Monday", "May 29"	"the day after Easter" (calendar lookup needed)
@sys.time	"3 PM", "15:00", "noon"	"around 3-ish" (fuzzy parsing required)
@sys.number	"5", "five", "a dozen"	"a few" (qualitative)
@sys.currency	"$50", "50 dollars", "fifty bucks"	"the equivalent of 5,000 yen in dollars"
@sys.email	"user@example.com"	"user [at] example [dot] com" (anti-scrape obfuscation)
@sys.phone	"+1 555 123 4567"	inconsistent international formats
@sys.location	"São Paulo", "Mexico City"	small towns missing from gazetteers

Custom entities (@product, @plan_tier, @account_id) cover business-specific vocabulary that no platform ships by default. These require ongoing maintenance as product catalogs evolve.

Natural Language Understanding — the broader NLU category.
Intent recognition — the complementary NLU task.
Natural Language Processing — the parent field.
Chatbot security and PII handling — protecting the personal data entity extraction captures.

FAQ

Is entity extraction the same as NER?

Yes — "Named Entity Recognition" is the academic / technical term; "entity extraction" is more common in product documentation. Both refer to the same task.

Can LLMs do entity extraction without training data?

Yes. Prompt a modern LLM with "Extract the date, product, and size from this message: [user message here]" and it works zero-shot for most common entity types in major languages. Dedicated NER models are still cheaper to run at scale.

What's the difference between entity extraction and intent recognition?

Intent = what the user wants (action). Entity = what specific data points the action needs (parameters). Both run on the same message; both fill in slots in the chatbot's response logic.

How accurate is modern entity extraction?

For well-defined entity types (date, time, currency, email) in major languages, modern LLM-based extraction reaches 90-97% precision. Custom domain entities (specific product SKUs, account IDs) depend heavily on training data quality and gazetteer coverage — well-maintained custom extractors hit 85-95%, neglected ones drop to 60-70%.

Can I extract entities from voice input?

Yes, but the pipeline adds latency and error compounding: speech-to-text transcription (95-99% accuracy in clean audio) feeds into entity extraction, so any transcription error propagates. Voice-specific gotchas include digit homophone confusion (15 vs 50), proper-noun mishearings, and dialect-specific number formats. Pair voice entity extraction with explicit user confirmation for high-stakes data.

Sources

Dialogflow documentation. Entities concepts. cloud.google.com/dialogflow (verified 26 May 2026).
Stanford NLP. NER course materials. nlp.stanford.edu/ner (verified 26 May 2026).