FAQ // All answers
How do I extract just URLs from a sitemap?
To extract URLs from a sitemap, read every <loc> tag inside the <urlset> root, dedupe, and output as one URL per line. xmlsitemapmaker.com's TXT export does exactly this — paste the sitemap URL, pick TXT as the format, get back a flat URL list. From the command line: curl https://example.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+'. In Python: xml.etree.ElementTree with the sitemap namespace. For sitemap-index files, fetch and union the <loc> contents of every child. Always dedupe — sitemap-index files occasionally repeat URLs across child files, and the same URL can appear in pages, posts and image sitemaps simultaneously on platforms like WordPress.
RELATED