How were the 182 prompts selected for this study?

We designed prompts to reflect real user behavior when searching for B2B SaaS products. The set included 150 single-turn prompts and 32 compound prompts covering problem-oriented questions, tool comparisons, feature-specific queries, and pricing questions.

Which AI platforms were tested?

We tested across ChatGPT, Gemini, and Perplexity. Results were aggregated across platforms to identify patterns that hold regardless of which AI a buyer uses.

Why did 79% of prompts rely on training data instead of searching the web?

AI models default to their training data whenever they estimate they already have a confident answer. Only queries that signal a need for current or comparative information reliably triggered web search.

Do these findings apply to industries outside B2B SaaS?

The core dynamics are likely to hold across most categories where buyers use AI for product research. The specific 79/21 ratio may vary by category.

What 182 LLM Prompt Tests Reveal About How AI Recommends B2B SaaS

What we tested

Metricus designed 182 prompts to reflect real buyer behavior when searching for B2B SaaS products: 150 single-turn prompts (direct questions) and 32 compound prompts (multi-step research workflows). We tested across ChatGPT, Gemini, and Perplexity, logging which discovery mechanism — training data or live web search — powered each answer. The results offer a data-driven picture of how AI actually decides what to recommend, documented in our AI visibility report methodology.

182

Prompts tested

79%

Training data answers

21%

Web search answers

AI platforms tested

Training data dominates most queries

What we found was striking: 79% of AI answers about B2B SaaS came from training data, not live web search. Problem-oriented prompts (“how do I reduce churn?”) and educational questions (“what is account-based marketing?”) almost never triggered web search. The AI treated these as general knowledge questions it could answer from training data alone. Only tool-comparison prompts (“best CRM for startups”) reliably triggered web search, accounting for most of the 21% that used live retrieval.

The implication is significant: if your product is not already embedded in the data AI models were trained on, the majority of buyer queries will never surface you — regardless of website quality, SEO, or paid campaigns.

Compound prompts behave differently

Multi-step prompts — where the buyer asks a follow-up or refines their query — triggered web search more reliably than single-turn prompts. What we observed is that when a buyer starts with a general question and then narrows to a specific comparison, the AI shifts from parametric to RAG mode. This means the buyer’s journey through AI is not uniform: early questions draw from training data, while later, more specific questions are more likely to trigger real-time search. Brands that appear only in web search results may be invisible during the initial discovery phase but surface during the comparison phase.

Multi-page web presence matters

For the 21% of prompts that did trigger web search, we found that brands with content spread across multiple authoritative pages — their own site, G2, industry publications, comparison articles — appeared more consistently than brands with a single strong page. AI models during retrieval pull from multiple sources and synthesize. A brand mentioned on five different pages across three domains carries more weight in the synthesis step than a brand with one comprehensive page on its own domain.

Platform-level differences

ChatGPT, Gemini, and Perplexity each behaved differently. Perplexity triggered web search most frequently and cited sources most transparently. ChatGPT relied more heavily on training data for general queries. Gemini fell between the two. What we found is that a brand’s visibility can vary dramatically depending on which AI platform a buyer happens to use, reinforcing the need for cross-platform measurement.

Citation and source patterns

Among the 21% of prompts that triggered web search, we found distinct citation patterns. AI models overwhelmingly favored third-party review sites (G2, Capterra, TrustRadius) and analyst reports (Gartner, Forrester) over vendor-owned content. Even when a vendor’s own blog post appeared in the underlying search results, AI summaries drew their recommendations from independent sources. What this means: your own website content may contribute to AI retrieval but third-party mentions carry more weight in the final synthesis.

We also found that recency mattered significantly for RAG-driven answers. Content updated within the past 90 days was cited more frequently than older pages covering the same topic. Stale content — even high-quality stale content — was systematically deprioritized in favor of recently published alternatives.

What this data means

The 79/21 split between training data and web search is the most important number in this study. It means that for the majority of buyer queries, the window to influence AI recommendations through content changes is narrow. The 21% that triggers web search is improvable through content strategy. The 79% requires presence in the sources that AI training pipelines prioritize — Wikipedia, major publications, analyst reports, and high-authority review sites. Understanding which category your brand falls into is the starting point for any visibility strategy.

Methodology: 182 prompts tested across ChatGPT, Gemini, and Perplexity in March 2026. Prompts designed to reflect real B2B SaaS buyer behavior. Discovery mechanism (training data vs web search) logged for each response.

Last updated: April 2026

What 182 LLM Prompt Tests Reveal About How AI Recommends B2B SaaS

What we tested

Training data dominates most queries

Compound prompts behave differently

Multi-page web presence matters

Platform-level differences

Citation and source patterns

What this data means

Find out what AI is getting wrong about your brand

We Audited AI Visibility for Top B2B SaaS Companies

AI Visibility Tools Compared

What Is AI Visibility?