Key Takeaways:
- An XML sitemap helps search engines find all important pages on your website
- Only indexable pages with status code 200 belong in the sitemap
- Submission through Google Search Console accelerates indexing of new content
Imagine your website as a massive department store. Without signage, visitors wander around and might never find the section they're looking for. An XML sitemap is exactly such a guide – only for search engine crawlers instead of people.
Google finds most pages through links. But what about new pages without incoming links? Or content buried deep in your website structure? This is where the XML sitemap comes in: It lists all important URLs and tells search engines exactly where to look.
What an XML Sitemap Actually Does
An XML sitemap is a structured file in XML format that lists all URLs you want search engines to index. It contains optional metadata like the last modification date, change frequency, and relative priority of a page.
The sitemap doesn't guarantee indexing. Google still decides independently which pages to include. But it ensures the crawler actually finds all pages – especially for large websites with thousands of subpages, this makes a significant difference.
When is a sitemap particularly important? For new websites without many backlinks, external signals that lead Google to your pages are missing. For very large websites with complex structures, the crawler might overlook sections without help. After a website relaunch, the sitemap helps communicate the new structure quickly.
The Anatomy of an XML Sitemap
A sitemap follows a standardized format. The basic structure looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/page</loc>
<lastmod>2026-01-02</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The loc element is required and contains the complete URL. All other elements are optional. lastmod shows the date of last modification – Google uses this information to decide whether a new crawl is necessary. The changefreq signals how often a page typically changes. The priority element indicates relative importance within your website, with 1.0 being highest and 0.0 lowest priority.
| Element | Required | Function |
|---|---|---|
| loc | Yes | Complete URL of the page |
| lastmod | No | Date of last modification |
| changefreq | No | Expected change frequency |
| priority | No | Relative priority (0.0-1.0) |
Which Pages Belong in the Sitemap
The golden rule is: Only pages that should and can be indexed. This sounds simple but is often done wrong.
Pages with status code 200, that have a self-referencing canonical tag, and aren't blocked by noindex definitely belong. Your most important landing pages, product pages, blog articles, and category pages form the core of the sitemap.
What doesn't belong is equally important. Pages with noindex tags, URL variants with parameters, paginated pages from page 2 onwards, search result pages, and duplicates must not appear in the sitemap. 404 error pages or redirects have no place there either.
A common mistake: The sitemap is automatically generated and contains everything the CMS knows. This leads to bloated sitemaps with irrelevant URLs. Google then has to figure out what's important on its own – wasting crawl budget.
Sitemap Types for Different Content
Besides the classic URL sitemap, there are specialized formats for different media types.
The image sitemap supplements URLs with information about embedded images. This is relevant for websites wanting to generate traffic through Google Image Search. Each image gets its own entry with title, description, and license information.
Video sitemaps work similarly and are essential for websites with video content. Here you specify thumbnail, title, description, and duration. Google uses this data for video search and video rich snippets.
News sitemaps are specifically for news websites. They contain articles from the last 48 hours with publication date and keywords. Only websites approved for Google News should use this format.
Creating a Sitemap Properly
Automatic Generation by CMS
Most content management systems generate sitemaps automatically. WordPress has used a built-in sitemap at /wp-sitemap.xml since version 5.5. Plugins like Yoast SEO or Rank Math offer extended control over included content.
The advantage of automatic generation: New pages are immediately included, deleted pages disappear. The disadvantage: You have less control and need to check default settings.
Manual Creation
For static websites or maximum control, create the sitemap manually. A simple text editor is sufficient. Ensure correct XML format and UTF-8 encoding.
For larger websites, tools like Screaming Frog or Sitebulb help. These crawlers search your website and export a complete sitemap. The advantage is that only reachable pages are included.
Sitemap Index for Large Websites
A single sitemap may contain a maximum of 50,000 URLs or 50 MB. Larger websites use a sitemap index that points to multiple individual sitemaps.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-blog.xml</loc>
</sitemap>
</sitemapindex>
This division also improves overview. Separate sitemaps for blog, products, and categories show in Google Search Console which areas have indexing problems.
Submitting to Google
Registering the sitemap with Google significantly accelerates crawling. Open Google Search Console, navigate to "Sitemaps," and enter your sitemap URL.
After submission, you'll see the status. "Success" means Google could read the sitemap. The number of discovered URLs shows how many pages Google found. The indexing status shows how many are actually in the index.
Discrepancies between submitted and indexed URLs are normal. Google doesn't automatically index everything. But large differences indicate problems: duplicate content, weak content, or technical blocks.
Common Mistakes and How to Avoid Them
Non-canonical URLs in the sitemap are the most common mistake. If a page has a canonical pointing to another URL, only the canonical version belongs in the sitemap. Otherwise, you're sending contradictory signals.
Outdated URLs often remain in automatically generated sitemaps. After deleting a page, it should also disappear from the sitemap. Regularly check that all URLs return status code 200.
Incorrect lastmod dates undermine Google's trust. If you update all dates with every build even though content hasn't changed, Google eventually ignores this information completely.
Oversized sitemaps with irrelevant URLs waste crawl budget. Every URL that Google crawls and classifies as unimportant is a missed opportunity for more important pages.
Connecting Sitemap and robots.txt
The robots.txt file can reference your sitemap. Add this line at the end of the file:
Sitemap: https://www.example.com/sitemap.xml
This has two advantages: Search engines find the sitemap automatically without you having to submit it manually. And you have a central place where the sitemap location is documented.
Regular Maintenance
A sitemap isn't a one-time project. It needs continuous care.
Check the indexing status monthly in Search Console. Compare submitted and indexed URLs. Investigate pages that aren't indexed despite sitemap entries.
After major website changes – new sections, changed URL structures, redesigns – update the sitemap and resubmit it. This signals to Google that something important has changed.
Use our SEO Analyzer to identify technical problems on your website that might also affect your sitemap effectiveness.
Frequently Asked Questions
Does every website need an XML sitemap?
Small websites with fewer than 100 pages and good internal linking often get by without a sitemap. Google finds all pages through links. But a sitemap can never hurt – the effort is minimal and it offers valuable insights in Search Console.
How often should I update the sitemap?
For dynamic websites with frequently new content, the sitemap should be updated automatically. For static websites, an update after content changes is sufficient. The lastmod date should only be changed when content has actually changed.
Can a faulty sitemap harm my rankings?
A faulty sitemap doesn't directly lead to ranking losses. Google simply ignores invalid entries. But you're missing potential: Pages are indexed more slowly, and you miss valuable diagnostic options in Search Console.