LLMS.txt Validator

Check if your LLM.txt file contains AI-specific components

Secure Validation

This validator checks your LLM.txt file for AI-specific components while maintaining security protections.

Analyzing LLMS.txt file...

Validation Results

Checking...

File Content:

LLM.txt Components Guide

A complete LLM.txt file should include these AI-specific components to guide how AI systems interact with your content:

AI-Specific Components:

  • AI-Model-Allow: model1, model2 - Specifies which AI models can access content
  • AI-Model-Disallow: model3, model4 - Specifies which AI models cannot access content
  • AI-Training-Allow: true/false - Allows or disallows content for AI training
  • AI-Indexing-Allow: true/false - Allows or disallows content for AI indexing
  • AI-Content-Boundary: description - Defines boundaries for AI content use

Optional Components:

  • AI-Contact: [email protected] - Contact for AI-related inquiries
  • AI-License: license-type - Specifies license for AI use of content
  • AI-Attribution: requirements - Specifies attribution requirements
  • AI-Version: 1.0 - Version of the LLM.txt specification
  • AI-Usage-Policy: policy details - Detailed policy for AI usage

Complete LLM.txt Example:

# LLM.txt for example.com # This file guides AI interactions with our content # AI Model Permissions AI-Model-Allow: GPT-4, Claude-2, Bard AI-Model-Disallow: Jurassic-2, GPT-3 # Content Usage Permissions AI-Training-Allow: true AI-Indexing-Allow: true AI-Content-Boundary: Public content only # Contact and Attribution AI-Contact: [email protected] AI-Attribution: Please cite example.com as the source # License Information AI-License: CC-BY-NC-4.0 AI-Usage-Policy: Non-commercial research only # Specification Version AI-Version: 1.0 # Optional: Content boundaries Exclude-Content: Financial data, personal information, private user content

What is llms.txt?

llms.txt is an emerging standard, inspired by robots.txt, that allows websites to communicate permissions and restrictions for large language models (LLMs) and AI crawlers.

How it works

With a simple text file placed at the root of your site (for example /llms.txt), you can:

  • Allow or disallow specific AI models.
  • Control whether your content can be indexed or used for training.
  • Define licensing and attribution requirements.
  • Protect sensitive or private data categories.

By validating your file, you ensure that your preferences are unambiguous and machine-readable.

How the Validator Works

  • Syntax Check – Parses your file for structural consistency and valid formatting.
  • Directive Validation – Ensures only recognized directives are used.
  • Conflict Detection – Identifies overlapping or contradictory rules.
  • Best Practices – Suggests improvements for clarity and interoperability.
  • Compliance Report – Outputs a human-readable summary of issues and fixes.

Validation Report example


-------------------------------------------------
✓ Syntax valid
✓ Allow-Models directive recognized
✓ License field correctly formatted (CC-BY-NC-4.0)

Warnings:
- Line 8: Invalid boolean value for Allow-Indexing ("Yes") → expected true/false
- Line 20: License identifier "Creative Commons 4.0" not recognized → suggest SPDX format

Best Practices for llms.txt

  • Place your llms.txt file at the root of your domain (e.g., example.com/llms.txt).
  • Use standardized directives such as Allow-Training, Disallow-Models, License.
  • Ensure boolean values are always true or false.
  • Use valid SPDX license identifiers (e.g., CC-BY-4.0, MIT).
  • Keep directives clear and unambiguous to avoid conflicts.

Sample Files

Basic Permissions


# Allow indexing and training
Allow-Training: true
Allow-Indexing: true
License: CC-BY-4.0

Restricted Use (Non-Commercial)


Allow-Training: true
Usage-Policy: Non-commercial research only
License: CC-BY-NC-4.0

Excluding Sensitive Data


Exclude: Financial data
Exclude: Personal information
Exclude: Private user content

Model-Specific Permissions


Allow-Models: GPT-4, Claude-2
Disallow-Models: GPT-3, Jurassic-2

FAQ

Is llms.txt legally binding?

No. It is a machine-readable signal. While not enforceable by law, it provides transparency and aligns with industry best practices.

Where should I place the file?

At the root of your domain: https://example.com/llms.txt.

What if I already use robots.txt?

You can use both. robots.txt is for search engines, llms.txt is for AI crawlers.

Will all AI crawlers respect this file?

Not yet. Adoption is growing, and many major providers are testing support.

Roadmap, Standards, and Validator Benefits

Roadmap & Standards Alignment

The llms.txt Validator is aligned with ongoing industry efforts, including:

  • llmstxt.org Draft Specification
  • AI Content Permission Protocol (AICP)
  • Alignment with existing web standards such as the Robots Exclusion Protocol (RFC 9309)

As the ecosystem evolves, the validator will be updated to reflect the latest conventions and directives.

Sources and References

Why Use This Validator?

  • Ensure your llms.txt is syntactically valid and easy to interpret.
  • Avoid conflicting rules that could make your preferences unenforceable.
  • Gain confidence that your website’s AI permissions are communicated effectively.
  • Stay aligned with industry best practices and future regulations.