Data Sources
This page catalogs every external data source Chatbotscape uses across its review, scoring, and ratings infrastructure. For each source, we document what it provides, how it's queried, what the refresh cadence is, what cost or rate-limit constraints apply, and what failure modes we've observed. Researchers, journalists, and methodology auditors can use this page to triangulate any Chatbotscape claim against its underlying source.
Primary sources for claim verification
Vendor public pages
The canonical source for any vendor's pricing, features, channels, integrations, certifications, and partnership claims. Every Tier 1 review verifies these claims directly against the vendor's current public pages within 30 days of publication for pricing and 6 months for functional claims.
| Vendor page type | Used for | Refresh cadence | Capture method |
|---|---|---|---|
| Pricing page | Pricing tiers, contact/seat limits, overage rates, free trial duration, currency variations | 90 days Tier 1 | Direct browser visit; monthly-billed toggle activated; URL parameters tested for multi-region |
| Product/features pages | AI capabilities, NLU language support, RAG/multimodal, BYOLLM claims | 6 months Tier 1 | Direct browser visit; feature claims captured against published vendor documentation |
| Integrations page | Native integration list, partner-platform names, third-party connector availability | 6 months Tier 1 | Direct browser visit; each named integration verified by name on vendor's integration directory |
| Channels page | Channel availability per tier, beta vs production status, BSP routing | 6 months Tier 1 | Direct browser visit; per-channel page checked for beta indicators |
| Help center / docs | Feature operational details, configuration depth, language coverage of documentation | 6 months Tier 1 | Direct browser visit; language switcher checked for help center localization scope |
| Trust / compliance / security pages | HIPAA, SOC 2, GDPR, LGPD, PCI-DSS certification claims | 6 months Tier 1 | Direct browser visit; certification claims verified on vendor's trust page or compliance documentation |
| About page | Founders, founding year, headquarters, team size | 12 months Tier 1 | Direct browser visit; cross-checked against LinkedIn vendor profile and press releases |
Failure modes we've observed and guard against:
- Vendors A/B testing pricing pages with different cohort-specific pricing → we capture from a clean session without prior cookies
- Vendors showing default-region currency that differs from the buyer's actual purchase currency → we capture native USD where the vendor supports it, и flag conversion math otherwise
- Vendors burying tier limits inside expandable accordions or tooltip overlays → we expand all UI elements before capture
- Vendors using "contact for pricing" framing on lower-tier plans for SMB segments → we capture this as a methodology note, not as missing data
Third-party aggregator pages (G2, Capterra, TrustPilot)
The canonical source for user-voice claims — review counts, star ratings, sub-rating breakdowns, recurring strengths and weaknesses patterns. Used in the User-Feedback-Patterns section of every Tier 1 review.
| Aggregator | URL pattern | Refresh cadence | What we extract |
|---|---|---|---|
| G2 | g2.com/products//reviews | 6 months Tier 1 | Star rating (overall), review count, sub-rating breakdown (Ease of Use, Features, Customer Service, Likelihood to Recommend), recurring strengths/weaknesses themes |
| Capterra | capterra.com/p/// | 6 months Tier 1 | Star rating, review count, sub-ratings (Ease of Use, Features, Value for Money, Customer Service), Capterra's vendor-rating breakdown |
| TrustPilot | trustpilot.com/review/ | 6 months Tier 1 | Star rating, review count, sentiment trend, complaint categories |
| Reddit (r/) | reddit.com/r/ | Spot check at refresh | Recent community discussion themes; supplementary signal only |
| Product Hunt | producthunt.com/products/ | 12 months Tier 1 | Upvote count, launch context (signal of vendor stage) |
| Reclame Aqui (Brazil-focused) | reclameaqui.com.br/ | 6 months for LATAM-priority reviews | Brazilian consumer complaint patterns; LATAM-priority signal |
Failure modes we've observed:
- G2 review counts in training data inflated 50× over actual (claimed 7,800 vs actual 163) → we always verify against current G2 page
- TrustPilot ratings often diverge significantly from G2/Capterra (G2 4.6 vs TrustPilot 2.5) → we publish both rather than assuming parity
- Capterra sub-rating breakdown not all dimensions visible publicly for all platforms → we publish what's visible, flag what's missing
- Selective excerpting of aggregator reviews to manufacture sentiment → we report dominant signal across at least 6 months, not most flattering or damning excerpts
Ahrefs API
The canonical source for multi-locale brand search volume — the primary input to platform popularity scoring (dimension 15 in the rubric).
| Endpoint | Used for | Cost | Refresh cadence |
|---|---|---|---|
keywords-explorer-volume-by-country | Per-country brand search volume across 10 target locales | API units (Standard plan: 50k units/month, ~10% usage) | 90 days Tier 1, 30-day cache during active review writing |
keywords-explorer-keyword-difficulty | KD signal for category and best-of list keyword targeting | API units | Pre-drafting for new pages; cached 30 days |
serp-overview | SERP feature presence | API units | Spot check pre-drafting |
Target locales: United States, Brazil, Mexico, Spain, Argentina, Colombia, India, United Kingdom, Germany, France. Multi-locale aggregation matters because US-only ranking severely distorts LATAM priority — Manychat brand vol in LATAM is 3× the US figure, SendPulse is 11× US, Typebot is 32× Brazil vs US, AiSensy is 86× India vs US. We use the volume-by-country endpoint before Tier assignment decisions to avoid US-centric distortion.
Failure modes we guard against:
- Memory-rule-derived heuristics ("LATAM is typically 11× US for chatbot platforms") used to derive per-country numbers without fresh API query → we query the API fresh for every brand vol claim per Rule 10
- API caching staleness when brand vol shifts between scheduled refreshes → 30-day cache TTL during active review writing
- Per-country data missing for low-volume brands → flag as "below measurable threshold" rather than report a zero
Meta Business Partner Directory
The canonical source for WhatsApp Business Solution Provider (BSP) partnership claims. BSP status is one of the 14 critical pros/cons triggers and materially affects WhatsApp template approval timelines (24-48 hours for BSP partners vs 5-7 days for non-partners).
| Source | URL | Refresh cadence | What we verify |
|---|---|---|---|
| Meta Business Partner Directory | facebook.com/business/partner-directory | 6 months Tier 1, on-demand if vendor claims BSP status change | Current partner listing, partner tier (Verified Partner / Solution Provider / Premier), region availability |
| Vendor's WhatsApp product page | .com/whatsapp | 6 months | Vendor self-claimed BSP status and "expedited template approval" framing |
Verification protocol: We cite the Meta Business Partner Directory listing URL in any review that claims BSP status. Where Meta directory access is intermittent or the vendor's listing has changed tier, we soften prose to "vendor-claimed" with verification-depth disclosure rather than rely on vendor self-claim alone.
Other partner directories
| Partner program | Directory URL | Used for | Refresh cadence |
|---|---|---|---|
| Google Cloud Partner Directory | cloud.google.com/find-a-partner | Google Cloud / Marketing partnership claims | 6 months on-demand |
| AWS Partner Network | aws.amazon.com/partners | AWS partnership claims | 6 months on-demand |
| HubSpot Solutions Directory | ecosystem.hubspot.com/marketplace/solutions | HubSpot partnership claims | 6 months on-demand |
| Shopify Partner Directory | partners.shopify.com/directory | Shopify partnership claims (relevant for ecommerce-bot platforms) | 6 months on-demand |
| TikTok for Business Partner Directory | tiktok.com/business/marketing-partners | TikTok Marketing Partner status | 6 months on-demand |
Partner-status claims that cannot be verified against the relevant directory are softened to "vendor-claimed" with verification-depth disclosure per Rule 9.
Crunchbase / PitchBook / vendor press releases
The canonical source for funding totals, vendor stability claims, and corporate-relationship facts (acquisitions, board changes, executive moves).
| Source | URL | Used for | Cost |
|---|---|---|---|
| Crunchbase | crunchbase.com/organization/ | Funding rounds, lead investors, total raised, executive bio | Free tier access for basic profile; paid tier for full funding round details |
| PitchBook | pitchbook.com/profiles/company/ | Detailed funding round data, comparable transactions | Paid tier required |
| Vendor press releases | Vendor news/press section | Recent funding announcements, partnership announcements, certification announcements | Free |
| TechCrunch / The Information / Business Insider | techcrunch.com, theinformation.com, businessinsider.com | Cross-verification of funding announcements, acquisitions, executive moves | Free for TechCrunch; paid subscriptions for The Information and Business Insider |
Failure modes: Funding round details in training data are typically months or years stale. We verify any funding total claim against the most recent press release or Crunchbase entry, with date stamped.
Hands-on testing observations
First-party data captured during the six-scenario testing protocol. Distinct from vendor or third-party claims — these are measurements we made directly on paid-tier accounts.
| Scenario | Measurement | Recording method | Refresh cadence |
|---|---|---|---|
| A — Basic FAQ bot | Time-to-first-bot in minutes, 20-query intent accuracy %, friction rating 1-5 | Timer + standardized test query battery + qualitative UX notes | 6 months Tier 1 |
| B — Lead capture + Sheets sync | Setup time, data fidelity %, friction rating | Timer + round-trip test submissions + qualitative UX notes | 6 months Tier 1 |
| C — WhatsApp commerce flow | Setup time, BSP template approval hours, friction rating | Timer + wall-clock template approval tracking + qualitative UX notes | 6 months Tier 1 |
| D — AI knowledge base multi-language | Intent accuracy %, citation rate %, hallucination rate % per language | 20-query test battery in EN + ES + PT-BR + market-relevant fourth language | 6 months Tier 1 |
| E — Human handover | Context-transfer friction 1-5, role-based-access correctness 1-5, multi-user inbox UX 1-5 | Qualitative UX assessment + role permission test | 6 months Tier 1 |
| F — Analytics + ad-conversion tracking | Per-area score 1-5 (dashboard, funnel builder, CSV export, ad-conversion tracking) | Qualitative dashboard audit | 6 months Tier 1 |
Test environment: Chrome on macOS, viewport 1440×900 at 2× retina. Primary locale English with secondary tests in Spanish (LATAM), Brazilian Portuguese, and a market-relevant fourth language. Test accounts created via standard public signup flow.
Inference basis for non-hands-on scenarios: Where direct hands-on measurement is not feasible (vendor demo-gates the account, paid tier blocks specific scenario testing, infrastructure access is restricted), scenario outcomes are anchored to vendor positioning, third-party reviewer reports + G2/Capterra recurring themes, comparable-platform measured benchmarks from existing reviews, and current-generation LLM capability patterns. The inference basis for each non-hands-on observation is documented in the review's POC notes sibling file. See /methodology/review-standards Rule 13 for the standardized format.
Internal data infrastructure
| Data | Purpose | Storage | Access |
|---|---|---|---|
data/market-pricing-data.csv | Cross-platform pricing dataset for VfM scoring and comparison engines | Repo CSV | Public-facing comparison outputs derive from this; raw CSV is internal infrastructure |
| Platform DB (Postgres / Neon) | Structured platform records, scoring fields, critical-trigger tracker | Postgres | Backend; surfaces to publishable reviews via component rendering |
| Image library | Watermarked, originality-tier-tagged screenshots from hands-on testing | Cloud storage | Surfaces to publishable reviews via image components |
| Brand vol cache | 30-day Ahrefs response cache during active review writing | Cloud storage | Internal; refreshed automatically |
Source authority and cross-verification
Single-source claims are not sufficient. Every material claim in a Tier 1 review is cross-verified across at least two independent sources where the claim type supports cross-verification. Pricing: vendor pricing page + market-pricing-data CSV. Channels: vendor channels page + vendor pricing page + hands-on observation. AI capabilities: vendor AI product page + vendor help center + hands-on testing observation. Partnership status: vendor self-claim + relevant partner directory.
Vendor self-claim is not sufficient for partnership badges, certifications, or aggregate user metrics. Per Rule 9 of the review hygiene standard, vendor-claimed partnership status without external verification is softened to "vendor-claimed" framing with verification-depth disclosure. Per Rule 6, vendor-claimed aggregate scores (review counts, ratings) are always cross-verified against the relevant aggregator.
Training data is never a valid source. Per Rules 6 and 10, training-data-derived claims (memory, "I recall reading", "typical for vendors in this category") are not acceptable substitutes for primary-source verification. Where primary source is unavailable, the claim is removed or labeled as estimate with methodology disclosure.
Source refresh tracking
Every Chatbotscape page that depends on external sources carries "Last verified" date fields per source type in its frontmatter:
last_verified:
pricing: 2026-05-24
features: 2026-05-24
channels: 2026-05-24
integrations: 2026-05-24
certifications: 2026-05-24
g2_score: 2026-05-25
capterra_score: 2026-05-25
trustpilot_score: 2026-05-25
ahrefs_brand_vol: 2026-05-20
meta_bsp_directory: 2026-05-25
The page footer surfaces the oldest "Last verified" date as the page's effective freshness signal. Where any date is more than 6 months stale, the page carries a staleness banner pointing the reader to either the next scheduled refresh date or the editorial review process contact form.
Version history of this page
- 2026-05-26 (v1.0) — Initial publication aligned to methodology v3.12.1. Sources catalog formalized after the Tier 1 anchor review batch demonstrated the need for explicit per-source refresh-cadence documentation and failure-mode disclosure.
Related pages
- Methodology overview — full scoring rubric and source weights
- Review standards — 14 pre-publish quality gates including source verification
- Update policy — refresh cadences and version history protocol
- Editorial policy — fact-verification standards