GEO in 2026: how AI engines decide what to cite, and how to win
Generative Engine Optimization is the new discipline of getting cited by ChatGPT, Claude, Perplexity, and Gemini. Here is what we have learned from running it in production for clients.
Generative Engine Optimization (GEO) is the discipline of getting your content cited by AI engines — ChatGPT, Claude, Perplexity, Gemini, You.com — when they answer a user’s question. In 2026 it captures a meaningful share of high-intent commercial queries that used to land on Google’s blue links. Many B2B sales pipelines now have AI-engine citations as a top-five inbound channel.
This post is a working summary of what we have learned running GEO programs for clients, including the technical patterns, the content patterns, and the false promises.
How AI engines pick citations
In broad strokes, the four major engines use similar machinery:
- Retrieval — when a user query comes in, the engine runs a search against a corpus (often Bing or Google’s web index, sometimes their own crawl, sometimes a third-party retrieval API like You.com or Perplexity’s). The top ~20-50 results are pulled.
- Re-ranking — the candidates are scored by an embedding-based model for relevance to the user query. Roughly the top 5-15 survive.
- Generation with citations — the LLM is given the retrieved context and asked to answer the user’s question while citing the sources it used. The model picks which sources to cite based on which ones it actually drew language from.
- Display — the final answer surfaces source citations as inline footnotes or a “Sources” list.
The leverage points are at steps 1-2 (your content has to be retrieved and re-ranked highly) and step 3 (your content has to be the kind the model picks to cite).
What gets retrieved and re-ranked
The retrieval/re-rank step is mostly traditional SEO with embedding-based modernisations:
- Authority signals: backlinks, domain age, brand mentions, schema.org markup. The same things that worked for Google in 2020 still matter
- Topical depth: does your site have multiple pieces of content on the topic, or just one shallow page
- Freshness: dateModified, dateModified, dateModified. AI engines aggressively prefer recent content for time-sensitive queries
- Semantic match quality: not just keyword presence but whether the document’s embeddings sit near the query’s embedding in vector space. Synonyms and concept-coverage matter more than exact phrases
Most established SEO playbooks transfer here. If you rank well on Google, you will mostly rank well in AI-engine retrieval.
What gets cited (the GEO-specific layer)
This is where GEO diverges from SEO. Among the documents that get retrieved, which ones get cited in the final answer? In our analysis of 200+ Perplexity and ChatGPT answers across queries our clients care about:
The most-cited formats:
- FAQ blocks — direct question/answer pairs. AI engines extract these almost verbatim. This is why we ship FAQ JSON-LD on every page that has one
- Comparison tables — “X vs Y vs Z” with explicit columns and rows. Trivial for an AI engine to convert into a comparison answer
- Lists of named entities — “the 5 best providers of X in MENA” with specific names. Easy to cite
- Definitions — “What is high-risk payment processing?” followed by a clear 2-3 sentence answer
- Numbered procedures — “How to choose between Claude and GPT” with explicit steps
The least-cited formats:
- Long narrative paragraphs without clear extraction points
- Marketing-speak (“we deliver world-class outcomes”) — AI engines down-weight promotional language
- Content behind paywalls or login walls (obviously)
- Content with no citation-worthy claims (vague, opinion-light, no specific numbers)
Schema.org markup that actually moves the needle
In rough order of impact on GEO citations:
| Schema | Why it matters | Where to use |
|---|---|---|
FAQPage | Most-cited format; structures Q&A clearly | Every page with a FAQ section |
Article / BlogPosting | Identifies citable long-form content | Every blog post and article |
Service | Identifies what your business does at entity level | Every service page |
Organization + LocalBusiness | Entity graph anchor | Once per site, on every page |
Person | E-E-A-T author attribution | Author bylines, team pages |
BreadcrumbList | Helps engines understand site structure | Every non-home page |
Review + aggregateRating | Trust signal; cited for “best X” queries | Where you have authentic reviews |
HowTo | Captures procedure content | Tutorial / methodology pages |
SpeakableSpecification | Indicates voice-readable content | Hero paragraphs, FAQ answers |
Schema is necessary but not sufficient. The visible content has to match. AI engines explicitly cross-check schema against visible HTML; bait-and-switch (schema says one thing, page says another) gets penalised.
The llms.txt standard
The llmstxt.org proposal is the de-facto 2025/2026 standard for guiding AI crawlers to a site’s most important resources in a token-efficient form. It is a markdown file at the root of your domain that lists key URLs with brief descriptions. ChatGPT, Claude, and Perplexity respect it for citation prioritisation.
Two files matter:
llms.txt— a short index (typically 100-300 lines) listing canonical resources by sectionllms-full.txt— full plaintext dump of all visible site content concatenated for efficient AI ingestion
Both should be reachable at the root domain. The cost is minimal; the benefit is meaningful. Every serious B2B site should ship them.
What does NOT work
Things we have tried that did not deliver:
- Keyword stuffing in invisible content — AI engines fingerprint and discount this. It is also a Google penalty
- Cloaking (showing different content to crawlers) — AI engines crawl with multiple agents and cross-check. Cloaking gets you de-indexed
- Sponsored content disguised as editorial — both Google and AI engines have become good at detecting it. The penalty is severe and persistent
- Excessive listicles (“Top 50 of X”) — the diminishing-returns curve flattens fast. Better to have 5 deeply substantive pieces than 50 shallow ones
- Pure AI-generated content with no editorial signature — AI engines specifically down-weight content that smells AI-written and unedited. Substance, specificity, named-entity references, and writer-voice all signal “this came from a human who knows something”
Building a GEO program
A pragmatic 90-day program for a B2B site:
Weeks 1-2: Audit
- Crawl your site, identify pages that should rank for your target queries
- Run those queries through ChatGPT, Claude, and Perplexity manually; record which pages currently get cited (if any)
- Audit schema.org markup against what we listed above
- Check whether
robots.txtallows GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others
Weeks 3-6: Foundation
- Ship
robots.txtexplicitly allowing major AI crawlers - Ship
llms.txtandllms-full.txt - Add
FAQPageschema to every page with a FAQ - Add
ServiceandBreadcrumbListschema where applicable - Fix any schema-vs-visible-content mismatches
Weeks 7-10: Content
- For each high-value query, ensure you have one substantive page (1500+ words) that answers it
- Add comparison tables, named-entity lists, and clear definitions to existing pages
- Expand thin pages; cull or merge near-duplicate pages
Weeks 11-12: Measurement
- Re-run the manual citation check from week 2
- Set up a monthly tracker (a simple spreadsheet noting which pages get cited for which queries)
- Plan the next 90-day cycle based on the gaps
What you cannot control
Two structural realities to absorb:
- You will not get citations on every query. AI engines try to cite from diverse sources; if a competitor’s content is comparable and they got there first, they keep the citation. The work is incremental
- The citation surface is shifting fast. Today’s “ChatGPT cites you” can become “Gemini cites you” or “Perplexity replaces ChatGPT in your buyer’s habit.” Build for the discipline, not for one specific engine
We expect GEO to consolidate over the next 12-18 months as the engines’ citation algorithms converge. The fundamentals — substantive content, clean schema, authoritative entity signals — work across all of them.
Get in touch
If you would like us to audit your site for GEO readiness and design the program, contact us at contact@kalastor.net. Typical engagement: 90 days to a measurable lift in AI-engine citation rate.
Adjacent reading: Claude vs GPT vs Gemini vs Mistral comparison, State of AI adoption in Egyptian enterprises.