Complete Guide: How to Create an llms.txt File
Learn how to create an llms.txt file with step-by-step setup, examples, tools, and best practices to manage AI crawlers on your site.
Most people know about robots.txt. Hardly anyone has heard of llms.txt. And that makes sense it's new, still evolving, and not every site owner even thinks about it yet. But if you've been wondering how to create an llms.txt file, what to put in it, and whether it actually matters for your website, this guide will help.
I've been digging into how different sites use it, testing a few setups myself, and following the early SEO chatter. What I've noticed is most guides either give you a 200-word explainer or push you straight to a generator tool. That leaves a lot of gaps. So let's walk through it properly: step by step, with examples, tools, and a few best practices I've picked up along the way.
What exactly is an llms.txt file?
The simple version: it's a text file you put on your website (same way you do with robots.txt) that gives large language model (LLM) crawlers instructions about whether they can use your content.
It's kind of like hanging a sign on your digital front door. Search engines look for robots.txt. Now, some AI crawlers are starting to check for llms.txt. For example, OpenAI's GPTBot and Anthropic's ClaudeBot already say they'll respect it.
Do you need one? That depends. If you publish recipes, research, or original articles you'd prefer not to end up in a training dataset, it's worth adding. If you're running a local plumbing site with basic service pages, it might not matter.
And here's the thing: even though not every AI crawler respects llms.txt right now, standards like this often stick. Robots.txt wasn't a big deal at first either now it's everywhere.
Why it matters (and why it might not)
Here's where I'll be honest: I don't know if llms.txt will end up being a universally respected standard. Some bots will ignore it. But the major players have said they'll check it, and that's a start.
The real value is signaling intent. You're essentially saying: "I'm fine with this" or "Hands off my stuff." For some businesses, that choice matters a lot. For others, maybe less.
So while it might not guarantee protection, the cost of setting it up is so low it feels like a no-brainer if you care about how your content is used.
How to create an llms.txt file (step by step)
This part is easier than you might expect.
1. Open a plain text editor Not Word or Google Docs. Use Notepad, TextEdit (plain text mode), or VS Code.
2. Create a new file called llms.txt
All lowercase, with the .txt
extension.
3. Add rules for AI crawlers The format looks a lot like robots.txt. For example:
User-agent: GPTBot
Disallow: /
That blocks OpenAI's GPTBot from your site. Want something more nuanced?
User-agent: GPTBot
Disallow: /private/
Allow: /blog/
This blocks GPTBot from private pages but allows blog posts.
4. Upload the file to your root directory
It should live at https://yoursite.com/llms.txt
. Same spot as robots.txt.
5. Test that it's live Type the URL into your browser. If you see your rules, you're good.
That's it. No coding required.
Concrete examples you can use
I find examples make it easier to picture how this works.
Example 1: A recipe blog
User-agent: GPTBot
Disallow: /members/
Allow: /
This blocks AI from paid "members-only" recipes but allows free ones.
Example 2: A SaaS startup
User-agent: *
Disallow: /dashboard/
Disallow: /api/
Allow: /
Keeps bots out of sensitive product areas but lets them see marketing pages.
Example 3: A full block
User-agent: *
Disallow: /
That's the "nope, nothing for you here" version.
Tangent: how this feels like déjà vu
Quick side note. I remember when GDPR rolled out in Europe and half the web panicked about cookie banners. At first, everyone was confused, implementations were messy, and some people ignored it. A few years later, cookie consent is just expected.
llms.txt feels similar. We're in the messy, early stage. Some sites are blocking, some are allowing, and some haven't even heard of it. But if enough big players keep checking for it, this could be standard practice in a few years.
Using tools to generate llms.txt
If editing text files isn't your thing, there are generators that do it for you. A few I've tested:
- Firecrawl's llms.txt generator: fast, minimal setup, good for beginners.
- llmstxt.org: no-frills, straight to the point.
- Writesonic's free generator: easy, though it feels more like a marketing funnel than a resource.
Honestly, if you can copy-paste text, you can build this file yourself. But if you manage multiple domains, a tool can save time.
For a deeper dive, I put together a [guide to llms.txt automation tools] that compares features.
Common mistakes I keep seeing
Because this is new, a lot of people are messing it up.
- Wrong file location: putting it in
/blog/llms.txt
instead of the root. Bots won't see it there. - Typos in bot names:
GPTBot
is case-sensitive. Write it wrong, rule doesn't apply. - Mixing with robots.txt: they're separate files. Don't cram them together.
- Blocking everything by accident: I've seen sites that disallow
/
without realizing that's a full block.
If you're unsure, check with a validator. Some generators have one built in.
Need more troubleshooting help? I've got a [full guide on fixing llms.txt errors].
Best practices (for now)
This is still evolving, so "best practices" might change. But today, here's what makes sense:
- Keep rules simple don't over-engineer.
- Use specific bot names if you only care about certain crawlers.
- Review your file every few months. Standards shift quickly.
- Pair llms.txt with other protections (like copyright notices or API authentication).
If you want a more detailed breakdown, I've written about [llms.txt best practices] that covers advanced SEO considerations.
Quick FAQs
Do I really need an llms.txt file? Not everyone does. But if you don't want AI using your content, it's worth having.
How do I test it? Visit yoursite.com/llms.txt. If the file loads, it's live.
Will it help my SEO? No. It's about AI training, not rankings.
Can I just add rules to robots.txt instead? No, they're separate. You should have both.
Wrapping it up
Here's the truth: creating an llms.txt file won't solve every problem with AI scraping your content. But it's simple, quick, and at least puts your intent on record.
Now you know what it is, how to make one, what examples look like, and where tools fit in. My advice? Try it out on your site. If nothing else, you'll be ahead of the curve.
If you don't want to start from scratch, grab one of the templates above. And if you'd rather automate it, check out [my guide to llms.txt tools].
I'm not sure how this standard will look in five years, but if history is any clue, ignoring it completely might not be the best bet. Better to spend five minutes now than scramble later.