What we tested
Metricus designed 182 prompts to reflect real buyer behavior when searching for B2B SaaS products: 150 single-turn prompts (direct questions) and 32 compound prompts (multi-step research workflows). We tested across ChatGPT, Gemini, and Perplexity, logging which discovery mechanism — training data or live web search — powered each answer. The results offer a data-driven picture of how AI actually decides what to recommend, documented in our AI visibility report methodology.
Training data dominates most queries
What we found was striking: 79% of AI answers about B2B SaaS came from training data, not live web search. Problem-oriented prompts (“how do I reduce churn?”) and educational questions (“what is account-based marketing?”) almost never triggered web search. The AI treated these as general knowledge questions it could answer from training data alone. Only tool-comparison prompts (“best CRM for startups”) reliably triggered web search, accounting for most of the 21% that used live retrieval.
The implication is significant: if your product is not already embedded in the data AI models were trained on, the majority of buyer queries will never surface you — regardless of website quality, SEO, or paid campaigns.
Compound prompts behave differently
Multi-step prompts — where the buyer asks a follow-up or refines their query — triggered web search more reliably than single-turn prompts. What we observed is that when a buyer starts with a general question and then narrows to a specific comparison, the AI shifts from parametric to RAG mode. This means the buyer’s journey through AI is not uniform: early questions draw from training data, while later, more specific questions are more likely to trigger real-time search. Brands that appear only in web search results may be invisible during the initial discovery phase but surface during the comparison phase.
Multi-page web presence matters
For the 21% of prompts that did trigger web search, we found that brands with content spread across multiple authoritative pages — their own site, G2, industry publications, comparison articles — appeared more consistently than brands with a single strong page. AI models during retrieval pull from multiple sources and synthesize. A brand mentioned on five different pages across three domains carries more weight in the synthesis step than a brand with one comprehensive page on its own domain.
Platform-level differences
ChatGPT, Gemini, and Perplexity each behaved differently. Perplexity triggered web search most frequently and cited sources most transparently. ChatGPT relied more heavily on training data for general queries. Gemini fell between the two. What we found is that a brand’s visibility can vary dramatically depending on which AI platform a buyer happens to use, reinforcing the need for cross-platform measurement.
Citation and source patterns
Among the 21% of prompts that triggered web search, we found distinct citation patterns. AI models overwhelmingly favored third-party review sites (G2, Capterra, TrustRadius) and analyst reports (Gartner, Forrester) over vendor-owned content. Even when a vendor’s own blog post appeared in the underlying search results, AI summaries drew their recommendations from independent sources. What this means: your own website content may contribute to AI retrieval but third-party mentions carry more weight in the final synthesis.
We also found that recency mattered significantly for RAG-driven answers. Content updated within the past 90 days was cited more frequently than older pages covering the same topic. Stale content — even high-quality stale content — was systematically deprioritized in favor of recently published alternatives.
What this data means
The 79/21 split between training data and web search is the most important number in this study. It means that for the majority of buyer queries, the window to influence AI recommendations through content changes is narrow. The 21% that triggers web search is improvable through content strategy. The 79% requires presence in the sources that AI training pipelines prioritize — Wikipedia, major publications, analyst reports, and high-authority review sites. Understanding which category your brand falls into is the starting point for any visibility strategy.
Methodology: 182 prompts tested across ChatGPT, Gemini, and Perplexity in March 2026. Prompts designed to reflect real B2B SaaS buyer behavior. Discovery mechanism (training data vs web search) logged for each response.
Last updated: April 2026