GUIDE // 01

robots.txt + sitemap, done right.

A no-fluff explanation of where to put your sitemap reference, what URL format Google expects, and the dumb mistakes that hide your sitemap from crawlers.

The one rule

Put a Sitemap: line in robots.txt with the absolute URL of your sitemap. No User-agent block required — it's a top-level directive.

User-agent: *
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

Multiple sitemaps

List each one on its own line. There's no limit.

Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-posts.xml
Sitemap: https://example.com/sitemap-images.xml

Common mistakes

  • Relative path. Sitemap: /sitemap.xml isn't valid — Googlebot ignores it. Always use the full URL.
  • Different host. The sitemap URL must be on the same host as the robots.txt. You can't host example.com's sitemap on cdn.example.net and expect it to work.
  • http vs https mismatch. If your site is on https, the sitemap URL should be too.
  • Disallowing the sitemap path. Crawlers still need to fetch it — don't Disallow: /sitemap.xml.

Check that yours works

Drop your domain into the sitemap checker — it'll fetch robots.txt, parse every Sitemap: line, and confirm each one actually serves a sitemap.

Frequently asked

FAQ

Do I have to declare the sitemap in robots.txt?
No, but you should. The alternative is submitting it manually in Google Search Console + Bing Webmaster Tools every time you add a new sitemap. The robots.txt line auto-publishes it.
Can the Sitemap: line be the only thing in robots.txt?
Yes. A robots.txt that contains just Sitemap: … is valid and useful — search engines pick up the reference even without any User-agent rules.
What if my robots.txt is blocked or 404s?
Crawlers fall back to no rules. They can still find your sitemap if you submitted it in Search Console, but exposure is reduced. Always serve a 200 robots.txt, even if it's effectively empty.