What is llms.txt? The GEO File for AI Crawlers
llms.txt is a plain text file that tells AI crawlers and large language models what to index on your site. Learn what it is, how it works, and how to create one in 20 minutes.
Read ArticleLast updated: March 27, 2026
When ChatGPT, Perplexity, or Google AI Search decides whether to cite your website, it does not rank pages the way Google does. There is no PageRank equivalent. Instead, AI models evaluate a set of trust and structure signals that tell them: is this source reliable enough to quote?
The Princeton GEO study (Aggarwal et al., KDD 2024) identified that embedding specific, sourced statistics increases LLM citation probability by 37%. That is one signal out of twelve. At Geovise, Konrad Kluz uses the full list below as the baseline for every GEO audit we deliver.
Here is what AI checks before citing your website.
Diagnostic question: Does your page have exactly one H1 that states the topic, followed by H2 and H3 headings that break the content into named sections?
AI models parse heading hierarchy to identify discrete topic chunks worth quoting. A flat page with one H1 and ten consecutive paragraphs gives the model no anchors to extract from. Each H2 should answer a specific question; each H3 should narrow it further. Eviacharge.pl uses a strict H1/H2/H3 hierarchy across all service pages, which is one of the structural reasons it appears in ChatGPT answers about EV charging in Poland.
Diagnostic question: Does your content have a named author with a bio, credentials, and a link to a professional profile?
E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness. Google codified this in its Search Quality Evaluator Guidelines, and AI models use the same signals when deciding source quality. An anonymous article is cited less. An article by Konrad Kluz, GEO practitioner at Geovise, with a LinkedIn profile and a verified case study, is cited more. Author schema (Person) on each article page is not optional if you want AI citations.
Diagnostic question: Do your pages include structured data for Article, FAQPage, Person, and Service types?
Schema.org markup is machine-readable metadata that tells AI exactly what each piece of content is. FAQPage schema is particularly powerful: it pre-packages question-answer pairs in the format LLMs prefer to extract. Article schema signals publication date and authorship. Service schema tells the model what you do and where. Without schema, the AI has to guess the context from plain text, and it often guesses wrong.
Diagnostic question: Does your page score above 90 on Google PageSpeed Insights, and do LCP, INP, and CLS meet Google's Core Web Vitals thresholds?
AI crawlers and search bots have limited time and resources per page. A slow page gets partially crawled or skipped. Core Web Vitals thresholds: LCP under 2.5 seconds, INP under 200 milliseconds, CLS under 0.1. Google replaced FID with INP in March 2024 because INP measures the full responsiveness of a page, not just the first interaction. These are not just Google ranking factors. They are signals of page quality that indirectly influence which content gets fully indexed and therefore cited.
Diagnostic question: Does your website have a /llms.txt file that tells AI crawlers which content is authoritative and worth reading?
Proposed by Jeremy Howard in 2024 and adopted at llmstxt.org, the llms.txt file is a Markdown document at your site root that curates your most LLM-relevant content. Unlike a sitemap (which lists everything), llms.txt points AI crawlers to the pages and documents that matter most. It is the difference between handing an AI a library card catalog and handing it a recommended reading list. Most SMB websites do not have one. That alone is a GEO gap.
Diagnostic question: Is your company name, founder name, and product name written identically across your website, Google Business Profile, LinkedIn, and every mention on third-party sites?
AI models build a knowledge graph of entities: people, companies, products, places. If your brand appears as "Geovise", "GeoVise", and "GEOVISE" across different sources, the model treats these as three different entities and dilutes citation authority across all three. Entity consistency is the single cheapest GEO win: an audit pass across your own web properties costs nothing except attention.
Diagnostic question: Do your pages show a visible last-updated date, and has the content been reviewed or revised within the last six months?
LLMs are trained on data with a knowledge cutoff, but retrieval-augmented systems like Perplexity and Bing Copilot actively fetch and prefer fresh content. A page last updated in 2021 signals to both crawlers and models that the information may be stale. Displaying the update date explicitly (not just in a meta tag) makes freshness machine-readable and user-readable simultaneously.
Diagnostic question: Does each article link to at least two related pages on your own site and at least one credible external source with a citation?
Internal links tell AI crawlers that your content is part of a coherent knowledge cluster, not an orphaned page. External links to authoritative sources (academic papers, government data, recognized industry bodies) signal that you have done the work of cross-referencing your claims. The Princeton GEO study found that citing sources within content increases LLM citation probability significantly. An article that references nothing appears unverifiable to a model designed to evaluate source reliability.
Diagnostic question: Does your page answer the questions users actually type into AI chatbots, not just the keywords they type into Google?
AI search queries are conversational. "How long does a GEO audit take?" not "GEO audit duration". Optimizing for traditional keywords misses the format AI expects. Every service page and article should include a FAQ section that mirrors how real users phrase questions to ChatGPT or Perplexity. FAQPage schema on top of this makes the Q&A directly extractable without the model needing to parse your prose.
Diagnostic question: When you state a statistic or factual claim, do you name the source inline and link to it?
The Princeton GEO research found that embedding specific, sourced statistics increases LLM citation probability by 37%. The mechanism is straightforward: AI models are trained to identify and propagate verifiable claims. An unsourced claim ("studies show that...") is less trustworthy to a model than a sourced claim ("the Aggarwal et al. KDD 2024 study found that..."). Write for verifiability, not just readability.
Diagnostic question: If you serve multiple markets, do your pages have correct hreflang tags, and is each language version a proper adaptation rather than a machine translation?
AI models that serve specific language markets (GPT-4 in German, Perplexity in Polish) prefer sources that explicitly signal geographic and linguistic relevance. Hreflang tells both Google and AI crawlers which language version to use for which audience. A site without hreflang forces the model to guess language and region, which increases the chance of the wrong version being cited or no version being cited at all. Geovise serves PL, DE, and EN markets, each with a fully localized content layer.
Diagnostic question: Does your content contain specific facts, numbers, or insights that are not available on competitor pages?
The Princeton GEO study found that content depth matters more than keyword optimization for LLM citation likelihood. AI models do not reward repetition of common knowledge. They reward novel, specific, verifiable information. A 400-word page about "what is GEO" that says the same things as 50 other pages will not be cited. A page that includes a real case study with specific numbers (eviacharge.pl cited in ChatGPT answers about EV charging within 6 weeks of GEO implementation) gives the model something genuinely citable.
| # | Signal | Priority |
|---|---|---|
| 1 | Heading structure (H1/H2/H3) | High |
| 2 | E-E-A-T: named author with credentials | High |
| 3 | Schema.org markup (Article, FAQPage, Person) | High |
| 4 | Page speed: Core Web Vitals (LCP, INP, CLS) | Medium |
| 5 | llms.txt file present and structured | Medium |
| 6 | Entity consistency across all channels | High |
| 7 | Content freshness with visible update date | Medium |
| 8 | Internal and external linking with citations | Medium |
| 9 | Conversational FAQ section | High |
| 10 | Sourced statistics cited inline | High |
| 11 | Hreflang for multi-market sites | Medium |
| 12 | Unique content with specific data | High |
Eviacharge.pl is an EV charging infrastructure company operating in Poland. Konrad Kluz ran a GEO audit and implementation on eviacharge.pl using all 12 signals on this checklist.
Before the audit: eviacharge.pl was invisible in AI search results for queries like "EV charging Poland", "ładowarki do samochodów elektrycznych" or "Ladestationen Polen".
After implementing all 12 GEO signals (heading structure, E-E-A-T author schema, FAQPage markup, entity consistency, llms.txt, and sourced statistics in content): eviacharge.pl began appearing in ChatGPT answers within 6 weeks. The site now ranks in both Google and AI-generated responses for EV-related queries in Polish.
None of the 12 signals required rebuilding the site. They were implemented on top of the existing structure through content updates, schema additions, and file-level changes.
The Geovise GEO Audit is a one-time engagement starting from €400. It covers all 12 signals on this checklist, delivers a prioritized action list, and includes a 30-minute call with Konrad Kluz to walk through the findings.
The audit tells you exactly where you stand today and what to fix first to start appearing in AI answers. It is the logical starting point before any GEO retainer.
Contact Geovise to book your GEO Audit. Get in touch via the contact form.
A GEO audit checklist is a structured list of 12 signals that AI models evaluate before citing a website in generated answers. Unlike an SEO audit that focuses on Google ranking factors, a GEO audit checklist covers AI-specific criteria: entity consistency, E-E-A-T signals, Schema.org markup, conversational FAQ coverage, llms.txt, and in-content citations with sources.
Learn more about our GEO and SEO services or view the full GEO Audit pricing.
FAQ
A Geovise GEO Audit is delivered within 5 business days. The audit covers all 12 AI citation signals, identifies gaps, and delivers a prioritized action plan. A 30-minute walkthrough call with Konrad Kluz is included.
Yes. An SEO audit focuses on Google ranking signals: backlinks, keyword placement, technical crawlability. A GEO audit focuses on AI citation signals: entity consistency, E-E-A-T, Schema.org markup, conversational FAQ coverage, llms.txt, and content depth. Some signals overlap (heading structure, page speed), but GEO audits check entirely different criteria that traditional SEO tools do not measure.
Start with entity consistency and E-E-A-T. Check that your company name, founder name, and product names are spelled identically everywhere: your website, Google Business Profile, LinkedIn, and any press mentions. Then add a named author with a bio and credentials to every article. These two changes are low-cost and have a high impact on AI citation likelihood.
No, llms.txt is not a hard requirement. AI models crawl and cite sites without it. However, llms.txt signals to AI crawlers which content is most authoritative and relevant, which increases the probability of the right pages being cited. It is a low-effort, high-signal addition that most competitors have not implemented.
From the eviacharge.pl case: after implementing the GEO recommendations, the site began appearing in ChatGPT answers within 6 weeks. Results depend on how many signals were missing before the audit and how quickly the changes are implemented. Entity consistency and FAQPage schema tend to produce the fastest measurable improvements.
Yes. Google rankings and AI citations are two separate visibility layers. A site can rank in positions 1 to 3 on Google and be completely absent from ChatGPT or Perplexity answers on the same topic. AI models use different evaluation criteria than Google's ranking algorithm. Strong Google SEO is a positive signal for GEO, but it does not guarantee AI citations.

Konrad Kluz is a GEO & SEO Specialist and senior software developer. Founder of Geovise — a boutique consultancy helping SMBs achieve visibility in both Google and AI search (ChatGPT, Perplexity, Google AI Overviews). Proven case study: eviacharge.pl.
LinkedInllms.txt is a plain text file that tells AI crawlers and large language models what to index on your site. Learn what it is, how it works, and how to create one in 20 minutes.
Read ArticleDiagnose why ChatGPT ignores your business. 8-signal checklist, manual test queries, and a clear fix timeline based on real GEO work.
Read ArticleLLMO (Large Language Model Optimization) is how businesses get cited in ChatGPT, Perplexity and Google AI. Definition, tactics, and how to get started.
Read ArticleFree 30-minute call
Get a free GEO audit and see where your brand stands.
Get Free Audit