How to optimize (GEO) a website for generative AI (LLMs)
Content Discoverability
Ensuring that the content created for AI is discoverable so that it can be integrated into their models is the fondation of GEO. While the site's navigation will be crawled by search engines, it is also important to consider increasing discoverability using the following methods:
- Adding links to related articles from the main article (for example, with a "Related Articles" section) to avoid "orphan" content.
- Implementing an XML sitemap.
- Implementing link tags with the appropriate rel parameter (prev, next, alt, etc.).
- Configuring non-blocking meta tags (index, follow, etc.).
- Configuring non-blocking server response headers.
- Implementing an RSS feed, which can be used by some crawlers.
Cookie walls and paywalls
Content hidden behind a cookie wall or a highly restrictive paywall will not be indexed correctly. These walls must be implemented with careful attention to discoverability. The content must be accessible to search engine crawlers, even if it is restricted for the visitor.
Asynchronously injected content
The same applies to content injected solely on the client side (asynchronous JavaScript): this might not be indexed if it isn't done with great care. Websites using technologies more suited to applications (such as React) may perform less well if compensatory measures aren't implemented.
Optimizing Formats
In addition to being well-written and discoverable, the content that best fits AI is presented in a pragmatic format, free of unnecessary clutter. Offering easily digestible formats for this new audience (AI) amplifies the impact of results.
Since HTML was brought to the world, website source code has become increasingly complex in order to create visually appealing, animated, and responsive websites designed to please humans. However, what AI crawlers are looking for is structured and semantically organized content. Any elements that don't contribute to the content are simply noise to them.
Fortunately, the recommendation here isn't to remove all graphical elements from the web. At a minimum, it's about implementing a simple, hierarchical, and valid DOM. Or, even better, offering alternative versions of the content: one version for human visitors and another for visitors who prefer the raw content.
Some of these alternative versions could be favoured and given greater prominence by search engines.
JSON‑LD/schema.org
Already present on a large number of websites, JSON-LD (or schema.org) data is well understood by AI indexing engines. Integrating structured JSON-LD/schema.org data therefore remains a key lever for making your content usable by AI. Consider the following entity types:
- Article
- Person/Organization for authors
- FAQPage/QAPage
- BreadcrumbList
- HowTo
- Product
- Event
To get the most out of this, prioritize a unique JSON-LD identifier (@id) per page, stable identifiers, and cross-references (publisher, author, isPartOf) to link entities. It is also possible to link your content with other web entities using sameAs (Wikidata, LinkedIn, GitHub, etc.). This allows you to link your content to the AI's overall representation of its subject.
Markdown
The Markdown format is also very well suited to unambiguous interpretation, since it is strictly composed of plain, structured text. The absence of graphical elements and its syntactic simplicity mean that all the content can be used by AI, unlike HTML where the indexing engine must interpret and filter the elements of a complex DOM (https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model).
This is the hypothesis put forward by Dries Buytaert, the creator of Drupal. A perfectly relevant and plausible hypothesis in my opinion.
Par exemple, l'article que vous lisez actuellement est la version standard, destinée à un visiteur régulier. Mais en ajoutant «.md» à l'adresse de votre navigateur, vous pourrez accéder au format Markdown du même article. Et le site web divulgue l'existence de ce format alternatif en utilisant une balise <link> :
For example, the article you are currently reading is the standard version, intended for a regular visitor. But by adding ".md" to your browser's address bar, you can access the Markdown format of the same article. The website then discloses the existence of this alternative format using a <link> tag.
<link rel="alternate" type="application/markdown" title="How to optimize (GEO) a website for generative AI (LLMs)" href="https://jmcouillard.com/en/blog/how-optimize-geo-website-generative-ai-llms.md" />
Several formats can therefore coexist for the same content page.
Make it easier to quote
AI responses tend to include more quotes taken directly from specific website content than before. Therefore, it is essential to facilitate citations to ensure proper attribution:
- Use short paragraphs that are easier to represent in vector databases for models.
- Use stable URLs and anchors so that a quote leads to a specific block of content.
- Use
<meta rel=canonical />to provide a URL that clearly identifies the content. - Use 301 redirects if URLs change.
- For multilingual content, use separate URLs for each language with consistent hreflang tags to avoid quotes in the wrong language.
Do not restrict access to indexing engines (crawlers)
Website and server settings can be configured to tell search engines how to interact with your content. If you want your content to be visible in search engines and usable by AI assistants, make sure your settings don't forbid indexing.
The directives included in the robots.txt file and the X-Robots-Tag HTTP header are part of this, and you must ensure they respect your intentions. Also, be careful to treat indexing (Googlebot, Bingbot) and model training bot (e.g., Google-Extended, Applebot-Extended, GPTBot, Claude/Claude-Web, PerplexityBot, CCBot) as separate entities: the former should generally remain allowed, while the latter can be adjusted according to your data policy.
Similarly, Cloudflare offers bot management features (including an "AI Scrapers & Crawlers" filter) that can block page discovery or content updates.
At all times, ensure that these settings align with your objectives.
All articles in the series
Back to the series
Why create and optimize content for AI?
How to write relevant GEO optimized content
How to optimize (GEO) a website for generative AI (LLMs)
What results can be expected from GEO optimizations ?