Entity Extraction· NLU task
Entity Extraction — Definition, How It Works in Chatbots (2026)
Quick answer~1 min
What it is
When a user types "Cancel my October 15 order for the blue dress in size M", a chatbot needs to know:
- Intent:
cancel_order - Date: October 15
- Product: blue dress
- Size: M
The dates, product, and size are entities. Entity extraction is the NLU task of finding and labeling them.
Standard entity types in most platforms:
- Person — names
- Location — cities, addresses, countries
- Organization — company names
- Date / Time — calendar references ("tomorrow", "next Tuesday", "3 PM")
- Money / Number — currency amounts, quantities
- Email / Phone / URL — structured strings
Plus custom entities specific to your domain — product names, SKUs, account IDs, plan tiers, etc.
How it works
Two main approaches:
1. Rule-based extraction
Regular expressions + dictionaries: a regex catches phone numbers; a dictionary of product names matches against the message text. Cheap, predictable, language-specific.
2. ML-based NER
Neural network trained on labeled examples. Modern platforms use either pretrained NER models (spaCy, Stanford NLP) or LLM-based extraction (prompt the LLM to "extract dates, products, and sizes from this message").
LLM-based extraction handles phrasing variation and multilingual input gracefully. Rule-based extraction is faster and cheaper but brittle.
Entity extraction in chatbot platforms
- Dialogflow — first-class entity definitions, system entities (date, location, currency, etc.) + custom entities.
- Botpress — visual entity definitions, LLM-extraction available.
- Rasa, Microsoft Bot Framework — similar explicit entity approach.
Marketing-focused platforms (Manychat, Chatfuel) typically capture entities through explicit form fields or button menus rather than NER on free text. Customer-service and AI-agent platforms make heavier use of NER.
When entity extraction matters
- Transactional flows. Booking, cancellation, lookup — these need structured data extracted from free text.
- CRM enrichment. Pulling names, companies, emails from conversation to populate records.
- Search and filtering. "Show me red shoes under $100" needs entity extraction (color, max_price) to translate to a database query.
- Multi-step conversations. Slot-filling — asking the user for each missing entity until all are collected.
Common pitfalls
- Ambiguous references. "Last Monday" depends on the current date. "Sales rep John" depends on which John. Production systems disambiguate carefully.
- Custom entities require maintenance. Product catalogs change; entity dictionaries need updates.
- Multi-language gotchas. Date formats differ (US: month/day; LATAM/EU: day/month). Currency symbols vary. Language-specific NER training data is uneven.
Worked example — slot-filling a booking flow
A salon chatbot needs to capture three entities to book an appointment: service, date, and time. Here is how the bot handles realistic user messages:
| Turn | User message | Entities extracted | Bot next action |
|---|---|---|---|
| 1 | "Hi, I want to book a haircut for Friday at 3 PM" | service=haircut, date=29 May 2026, time=15:00 | Confirm and book |
| 2 | "I need an appointment" | (none) | Ask: "What service, and when?" |
| 3 | "tomorrow at 2" | date=27 May 2026, time=14:00, service=null | Ask: "Which service?" |
| 4 | "the cheap one I had last time" | (anaphoric — needs CRM lookup) | Lookup user's history, propose match |
| 5 | "haircut next Tuesday around 3-ish" | service=haircut, date=2 June 2026, time≈15:00 (fuzzy) | Confirm with "3 PM, OK?" |
| 6 | "quero cortar cabelo amanhã às 14h" (PT-BR) | service=haircut, date=27 May 2026, time=14:00 | Confirm in PT-BR |
Cases 3 and 5 illustrate why fuzzy time and date handling matters in production. Case 4 shows how entity extraction alone is insufficient when references are anaphoric — the bot must combine NER with CRM lookup. Case 6 demonstrates the multilingual edge: a PT-BR-trained NER understands "amanhã" (tomorrow) and "às 14h" (at 2 PM), while an English-only NER would fail entirely.
Entity types in production
Most platforms ship a standard library of pre-built entity types you can use without custom training:
| Entity type | Examples it catches | Edge cases that fail |
|---|---|---|
| @sys.date | "tomorrow", "next Monday", "May 29" | "the day after Easter" (calendar lookup needed) |
| @sys.time | "3 PM", "15:00", "noon" | "around 3-ish" (fuzzy parsing required) |
| @sys.number | "5", "five", "a dozen" | "a few" (qualitative) |
| @sys.currency | "$50", "50 dollars", "fifty bucks" | "the equivalent of 5,000 yen in dollars" |
| @sys.email | "user@example.com" | "user [at] example [dot] com" (anti-scrape obfuscation) |
| @sys.phone | "+1 555 123 4567" | inconsistent international formats |
| @sys.location | "São Paulo", "Mexico City" | small towns missing from gazetteers |
Custom entities (@product, @plan_tier, @account_id) cover business-specific vocabulary that no platform ships by default. These require ongoing maintenance as product catalogs evolve.
Related terms
- Natural Language Understanding — the broader NLU category.
- Intent recognition — the complementary NLU task.
- Natural Language Processing — the parent field.
FAQ
Is entity extraction the same as NER?
Yes — "Named Entity Recognition" is the academic / technical term; "entity extraction" is more common in product documentation. Both refer to the same task.
Can LLMs do entity extraction without training data?
Yes. Prompt a modern LLM with "Extract the date, product, and size from this message: [user message here]" and it works zero-shot for most common entity types in major languages. Dedicated NER models are still cheaper to run at scale.
What's the difference between entity extraction and intent recognition?
Intent = what the user wants (action). Entity = what specific data points the action needs (parameters). Both run on the same message; both fill in slots in the chatbot's response logic.
How accurate is modern entity extraction?
For well-defined entity types (date, time, currency, email) in major languages, modern LLM-based extraction reaches 90-97% precision. Custom domain entities (specific product SKUs, account IDs) depend heavily on training data quality and gazetteer coverage — well-maintained custom extractors hit 85-95%, neglected ones drop to 60-70%.
Can I extract entities from voice input?
Yes, but the pipeline adds latency and error compounding: speech-to-text transcription (95-99% accuracy in clean audio) feeds into entity extraction, so any transcription error propagates. Voice-specific gotchas include digit homophone confusion (15 vs 50), proper-noun mishearings, and dialect-specific number formats. Pair voice entity extraction with explicit user confirmation for high-stakes data.
Sources
- Dialogflow documentation. Entities concepts. cloud.google.com/dialogflow (verified 26 May 2026).
- Stanford NLP. NER course materials. nlp.stanford.edu/ner (verified 26 May 2026).