This Robots.txt checker shows the live robots.txt from any site you enter. Paste a home page URL and view the file exactly as crawlers do, then copy or share the content.

Why you should check your robots txt file

Your Robots.txt protects private areas and guides crawlers to useful pages. Regular checks help you prevent unwanted indexing of sensitive paths, improve crawl efficiency across your site, and avoid wasting crawl budget on duplicate or low value sections. When your site structure changes or you add new sections, a quick look at the live file confirms that your directions to crawlers still match your intent.

Robots txt gives simple rules to follow and that clarity helps search engines understand your site faster. Keeping the file clean and current supports a healthy presence in search and reduces indexing surprises.

SEO tips for robots txt

The Robots.txt should ‘allow’ things that help discovery and ‘block’ only what needs privacy or noise control. Use Disallow with care so important pages remain available and let crawlers fetch CSS and JavaScript that render your layout and content. Add your XML sitemap link with a full absolute URL so crawlers can find every section quickly. Review the file whenever you ship a major change and test after updates so you catch mistakes before they affect visibility.

Introduction to robots txt

Robots.txt is a small text file that sits at your site root and tells crawlers which areas they may visit. A good configuration supports technical SEO by keeping thin or private areas out of the crawl while leaving valuable content open. Treat the file as an access guide rather than an indexing switch and manage indexing with page level meta or response headers where needed.

How to create a robots txt file

Robots.txt is easy to create and update with a plain text editor. Write your directives, save the file as robots.txt and place it in the root of your site so it is available at https://www.yoursite.com/robots.txt. Keep one live file per host to avoid confusion and remember that crawlers fetch the file from the same host as the pages they crawl.

User-agent: *
Disallow: /private/

Common directives explained

Robots.txt uses a few simple lines that most platforms support. These are the ones you will use most often with a short plain language description for each.

  • User-agent This names the crawler a rule applies to. A star means any crawler and a named agent targets a specific bot.
  • Disallow This blocks access to a path. Use leading slashes and match the case of your URLs.
  • Allow This creates an exception to a broader block. A specific allow for a file inside a blocked folder will still permit that file.
  • Sitemap This points to your XML sitemap so crawlers can find URLs faster. Use full absolute links.

Examples you can adapt

These short patterns cover common needs. Start with the closest match and adjust paths to suit your site. Test after publishing to confirm the live file is as expected.

Allow all crawlers

Public sites that rely on page level meta for indexing can keep this simple and leave all paths open for crawling.

User-agent: *
Disallow:

Block admin while allowing uploads

Content sites often block the admin area while leaving media and async actions available, which keeps private routes out of the crawl and allows assets to load.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Reduce crawl of messy parameter URLs

Sites with many parameter pages can trim crawl noise by blocking bare question mark endings while leaving clean URLs available.

User-agent: *
Disallow: /*?$

Target one folder and one file

If you have a legacy folder or a one off path to keep out of the crawl, block it directly and leave the rest of the site open.

User-agent: *
Disallow: /private/
Disallow: /cart/checkout.html

Add sitemap lines

Listing sitemaps in robots.txt makes discovery easier. Include the index if your platform splits sitemaps by type.

Sitemap: https://example.com/sitemap_index.xml
Sitemap: https://example.com/page-sitemap.xml

Troubleshooting checks

Robots.txt needs to be reachable and current. Confirm the live file loads at the root over HTTPS, check that your CDN or proxy is not serving an old copy, and make sure staging rules did not ship to production. If the file you see in your editor does not match the file in the viewer, republish and clear any caches, then test again.

Frequently asked questions

Where should robots.txt live

Robots.txt belongs at the root at /robots.txt on the same host as your pages. Keep one file per host and avoid duplicates on other subdomains.

What status code is correct for the file

A valid file returns 200 when present and a missing file returns 404 or 410. Both outcomes are acceptable and you can choose based on whether you need rules or a simple default.

Does robots.txt control indexing

Robots.txt controls crawling rather than indexing. Use meta robots or an x robots header when you need to prevent a page from appearing in search.

Which rule applies when rules seem to clash

The most specific match applies for modern crawlers. A direct allow for a single file can override a broader folder block.

What this tool does and does not do

This tool fetches and displays the live robots.txt from the domain you enter. It does not test individual URLs, simulate user agents, or request a recrawl. Use your search console or a dedicated tester for those tasks and return here when you simply need to see the live file.