What we tested and how
We randomly sampled a subset of B2B SaaS companies from BuiltIn’s list of top B2B SaaS companies, selecting across different categories: process mining, project management, cybersecurity, payments, life sciences R&D, and social media management. All are established, well-reviewed products with real market presence. (If you’re new to the concept, start with our complete guide to AI visibility.)
For each company, we asked an AI model with web search access the kinds of questions real buyers in that category would ask. Across all audits, the AI generated over 150 search queries autonomously.
For every question, we logged: whether the brand appeared, how it was mentioned (recommended, listed, compared, or absent), the sentiment, whether the AI used web search (RAG) or answered from training data (parametric), and which competing brands surfaced instead.
We anonymized the results below. The companies themselves are not the point — the patterns are.
The methodology is designed to answer one question: if a real buyer in your category asks AI for help, does your brand show up?
The visibility scores
| Company | Category | Visibility | Pattern |
|---|---|---|---|
| SaaS 1 | Process Mining | 83% | Appeared in most queries; missed in broader questions |
| SaaS 2 | Lab Software | 75% | Strong when buyer used technical language; weak otherwise |
| SaaS 3 | Project Management | 67% | Dominant in “which tool is best” queries; invisible in industry-specific ones |
| SaaS 4 | Payments | 67% | Appeared for large-company questions; absent for startup/SMB questions |
| SaaS 5 | Social Media Mgmt | 67% | Visible for mid-market and up; invisible for solo/small-business questions |
| SaaS 6 | Cybersecurity | 50% | Top pick for one niche question; invisible for most others |
The spread is notable. A 33-percentage-point gap between the most visible and least visible company in our sample — all well-established, well-reviewed products. These are not obscure brands. They are not lacking in product quality. The gap is in how AI discovers and surfaces them depending on who is asking. For a deeper look at how these scores are calculated and what makes them meaningful, see how AI visibility scores actually work.
The most consistent pattern across the table: every company had at least one type of buyer question where it was completely invisible. No company in our sample was visible across the board. A brand that dominates one kind of query can be entirely absent from another.
The vocabulary gap: the single biggest predictor of invisibility
Across all audits, one pattern explained more visibility failures than any other: the gap between how a company describes itself and how buyers describe their problem.
SaaS 6 (cybersecurity) positions itself using category-specific terminology — “managed EDR platform,” “managed security platform.” Its ideal customers, mid-size companies without security teams, ask about “cybersecurity tools,” “ransomware protection,” and “protect my company.” When AI generates search queries from a buyer’s question, it searches for the buyer’s language, not the vendor’s category terminology. The result: better-known competitors appear. SaaS 6 does not.
The one question where SaaS 6 dominated? One where the buyer’s vocabulary exactly matched the company’s positioning. The AI searched for those niche-specific terms, found SaaS 6 everywhere, and recommended it first.
The pattern held across every company we tested. SaaS 3 (project management) dominates broad “which tool is best” queries but is invisible when a buyer asks about their specific industry. SaaS 4 (payments) is the top pick for large-company questions but absent when a founder asks about payment processing for a new online store. In every case, the brand’s strongest visibility was on questions where buyer language and company language aligned.
Here are the vocabulary gap patterns we documented:
| Company | Buyer Searches For | Company Says | Result |
|---|---|---|---|
| SaaS 6 | “ransomware protection tools” | “managed EDR platform” | Invisible |
| SaaS 6 | “HIPAA compliant security” | “healthcare cybersecurity” | Invisible |
| SaaS 3 | “construction project management” | “work management platform” | Invisible (industry specialists win) |
| SaaS 3 | “replace legacy PPM tool” | “work management platform” | Weak (enterprise PPM vendors win) |
| SaaS 4 | “payment processor for online store” | “unified commerce platform” | Listed with caveat (“not for beginners”) |
| SaaS 5 | “social media tools for small business” | “comprehensive social management” | Invisible (SMB-focused tools win) |
This is not a content quality problem. In multiple cases, the company had dedicated pages with relevant content — healthcare-specific pages, industry landing pages, templates for specific use cases. The content exists. It simply does not rank for the terms buyers actually use, which means AI never finds it during retrieval. This is the same disconnect between Google rankings and AI recommendations that we explore in why your SEO strategy is ignoring 37% of buyers.
RAG vs. parametric: which engine decides
Every AI response comes through one of two mechanisms. Understanding which one governs your category changes your entire optimization strategy.
RAG (retrieval-augmented generation) is when the AI searches the web in real time, pulls relevant passages, and assembles an answer with citations. This powered the vast majority of questions in our audit — roughly 80% of all responses were RAG-driven. When you see linked sources beneath an AI answer, that is RAG.
Parametric knowledge is when the AI answers from patterns compressed into its training data, with no web search. This is rarer but significant. For one company (SaaS 1, process mining), one question triggered zero web searches — the AI recommended the company entirely from training data. No search queries. No citations. Pure parametric recall.
The practical implication: if your visibility depends on RAG, you can improve it in weeks by restructuring content and earning citations. If it depends on parametric knowledge, you are waiting for the next model training cycle — a timeline measured in months. Most companies need both, but the ratio determines which team owns the work.
In our data, SaaS 1 was the only brand with confirmed parametric presence. The remaining five companies’ visibility was almost entirely RAG-driven — meaning their appearance in AI answers depends on whether their web content ranks for the terms the AI searches for. That is both the opportunity (restructure content, see results in weeks) and the current gap (content may exist but not rank for buyer vocabulary).
What AI actually cites
Across 150+ search queries generated during these audits, clear patterns emerged in which sources AI models cited most often when assembling recommendations:
- G2 appeared as a cited source in the majority of audits. AI models treat G2 reviews as high-trust third-party validation. If your G2 listing has outdated pricing or a stale product description, that stale information becomes AI’s “fact.”
- Gartner dominated when buyers asked about enterprise-grade solutions. In one audit, Gartner Magic Quadrant data appeared in most answers. A Gartner leadership position was a primary factor in AI’s recommendation for enterprise questions.
- Company-owned blog content ranked surprisingly well in raw search results. Multiple companies’ own blog posts appeared as search hits for category queries. But AI summaries still favored competitors when third-party editorial sources carried more weight in the final synthesis.
- Industry-specific publications dominated when buyers asked about their specific industry. Industry searches surfaced ecosystem content from specialist publications. General B2B outlets had almost no influence on those questions.
The finding that matters: AI models treat third-party citations as more authoritative than company-owned content during answer synthesis, even when the company’s own content appears in the underlying search results. In one audit, the company’s own cybersecurity guide appeared in raw search results, but AI still recommended better-known competitors because independent review sites recommended them more prominently.
Limitations and caveats
This is a small sample. That is enough to identify patterns but not enough to draw statistical conclusions. A few things to keep in mind:
- Small sample sizes produce extreme scores. Change one result and a score swings by double digits. We would not read too much into the precise percentages — the patterns across companies are more informative than any individual score.
- AI responses are non-deterministic. Running the same question twice can produce different results, different search queries, and different cited sources. Our results represent one snapshot, not a stable measurement. A production monitoring system would need to test repeatedly over time.
- One model, one day. These audits used a single AI model on a single day. Different models (GPT-4, Gemini, Perplexity) may surface different brands, different sources, and different rankings. Cross-platform audits would likely reveal a more nuanced picture.
- We selected from a curated list. The companies were randomly sampled from BuiltIn’s top B2B SaaS list, which skews toward well-funded, US-based companies. Results for smaller or international brands might look very different.
We publish this data because the patterns are consistent and actionable even in a small sample. The vocabulary gap, the RAG/parametric split, and the third-party citation hierarchy appeared across every audit. But treat the specific scores as directional, not definitive.
What this means for your brand
Even with the caveats above, these audits surface three specific, testable problems. Each one has a corresponding fix that does not require new budget or new headcount.
1. Test your vocabulary alignment today
Think about five different buyers in your market. For each one, write the question they would actually ask an AI chatbot — in their language, not yours. Run those queries. If you do not appear, check whether the AI’s search queries use your category terms or the buyer’s problem terms. The gap between those two vocabularies is the gap you need to close.
In our data, vocabulary alignment was the single strongest predictor of whether a brand appeared. SaaS 6 went from invisible to the top recommendation when the query used its niche vocabulary. SaaS 3 went from top-3 to invisible when the query used industry-specific terminology. The product did not change. The words did.
2. Audit your third-party listings
G2 appeared in the majority of audits as a cited source. Gartner dominated enterprise questions. If your G2 listing still describes your 2024 product, AI is recommending your 2024 product. If you are not in the Gartner conversation for your category, AI is not recommending you to enterprise buyers. These are not SEO concerns. They are AI citation sources.
3. Determine your RAG/parametric split
Ask an AI to recommend a product in your category. If the response includes linked citations, your visibility is RAG-driven and improvable through web content changes. If it answers confidently with no citations, you are dealing with parametric knowledge — a longer timeline requiring presence in the sources that training pipelines prioritize (Wikipedia, Gartner, major news outlets).
In our sample, only one company had confirmed parametric presence. The rest were entirely RAG-dependent. Knowing which one you are determines whether your content team or your PR team should lead the response.
The core finding across every audit: AI does not recommend the best product. It recommends the product it can find using the buyer’s language, from sources it trusts. Market share, funding, even product quality — none of these guarantee visibility. Vocabulary alignment, third-party citations, and content structure do.
Methodology: Structured AI visibility audits conducted March 2026, using an AI model with web search augmentation. Companies randomly sampled from BuiltIn’s top B2B SaaS companies list across multiple categories. For each company, we asked the types of questions real buyers in that category would ask. Over 150 search queries were generated autonomously by the AI model during audits. Results anonymized. Scores are directional — small sample sizes mean individual percentages should be treated as indicative, not definitive.