Skip to main content

Your Sitemap Is Doing the Opposite of What You Think

Every URL in your sitemap is a statement to Google about what deserves to be indexed. Most sites are confessing more than they realize.

Your Sitemap Is Doing the Opposite of What You Think
Share on LinkedIn

You submit 48K URLs in your XML sitemap. 11K get indexed. You keep asking why Google won't crawl more.

You have it backwards. Google crawled plenty. It just disagreed with your definition of "worth indexing" on 77% of what you submitted.

The pattern is almost always the same. Sitemaps full of pages nobody would defend in a content review, submitted on autopilot, signaling to Google that the overall quality of your site is weak.

You're probably treating your sitemap as a list. A bunch of URLs Google should know about. Send everything, let Google sort it out.

That's not what a sitemap is.

Every URL in your sitemap is a vouch

When you submit a URL, you're telling Google four things about it. This URL exists. It's the canonical version. It can be indexed. It's worth showing in search results.

You're not asking. You're vouching.

Submit 48K URLs and you've vouched 48K times. If Google rejects 37K of them, you've handed it a written record of pages you put your name on that didn't make the cut. The next URL you publish starts with that handicap. Google has already learned your judgment can't be trusted.

This is the shift most SEOs miss. Your sitemap isn't a request for Google to do more work. It's a curated list of pages you're willing to be judged on.

Once you accept that, your default flips from "include everything we can" to "include only what we'd defend."

What your CMS quietly puts in for you

Most sitemaps weren't curated. They were generated.

A plugin scanned your database, found every URL the system knew about, and dumped it into /sitemap.xml. Nobody checked the output.

So your sitemap ends up with tag pages nobody reads. Author archives for writers who left years ago. Paginated archives like /blog/page/47. Faceted URLs that combine four filters nobody searches for. Internal search results. Parameter URLs that duplicate canonical content. Expired products. Empty category pages with two products in them.

None of these pages deserve a spot in search results. Just because they exist on your site doesn't mean Google should index them.

Your first move on any sitemap audit is removing them. Not redirecting, not adding noindex. Just removing them from the sitemap. Keep them crawlable through internal links if they have any navigation value. Otherwise let them fade.

The basics most sitemaps quietly fail

Most audits I run fail before we even get to content quality. The basics are wrong.

Every URL must return 200. Not 301, not 302, not 404, not 5xx. A redirected URL tells Google your sitemap is out of date. The URL you're vouching for doesn't even live at that address anymore. If you submit /old-product/ and it redirects to /new-product/, you've created a contradiction. Submit the final URL directly.

Every URL must be indexable. No noindex meta tag, no noindex x-robots header. This is the most common mistake I find. Your page has noindex set by some plugin, your sitemap generator doesn't check, and the URL gets submitted anyway. You're telling Google two opposite things at once. Google follows the page directive and stops trusting your sitemap.

Every URL must be its own canonical. If your sitemap says /product?color=red and the page's canonical points to /product, you have a conflict. The wrong URL often gets indexed. Or neither does.

Every URL must be allowed by robots.txt. Submitting URLs you've blocked from crawling is a contradiction Google's documentation explicitly calls out. I still find it on roughly one in three audits.

Lastmod must be accurate or empty. The signal is binary. Google trusts it or doesn't. If your CMS stamps every URL with today's date every time it regenerates the sitemap, you've broken lastmod for your entire domain. An inaccurate lastmod is worse than no lastmod at all.

Sitemap files cap at 50K URLs or 50MB uncompressed. Personally, I'd go much lower than that. I see processing issues all the time when sites push close to the limit. Files get partially read or skipped entirely in Search Console. Use a sitemap index and keep each file well under the cap.

None of this is new. All of it is in the documentation. Yet I find at least one violation on virtually every sitemap I open.

Less submitted, more indexed

The relationship between sitemap size and indexation is often inverse.

I've seen sites cut their sitemap by 80% and watch indexation rise. The pages that survive the cut stop competing with thousands of weak ones for Google's attention.

This is hard to accept. Your instinct when indexation is bad is to push for more. More URLs, more sitemaps. The right move is almost always the opposite.

A sitemap of 15K URLs that indexes at 96% is stronger than yours at 48K URLs indexing at 23%. Always. The first one is a recommendation Google trusts. The second is noise Google has learned to ignore.

How to actually fix it

Start by segmenting your sitemap. Categories in one, products in another, articles in a third. It doesn't help crawling, but it makes your indexation problem visible by template once you start measuring.

Open Google Search Console. The Sitemaps report shows what Google parsed from each sitemap. The Page Indexing report, filtered by sitemap, shows how many URLs from each actually made it into the index. Compare the two. When your category pages index at 95% and your product pages at 35%, you know exactly which template to fix.

For ongoing tracking, my own tool VitalSentinel monitors indexation specifically for URLs in your sitemap, so you spot template-level drift before it gets worse. Other monitoring tools work too.

Run your sitemap through Screaming Frog in list mode. Flag every URL that isn't 200, that has a noindex, that has a canonical pointing elsewhere, that's blocked by robots.txt. Remove them all.

Then the harder call. Look at the templates with weak indexation that also aren't pulling impressions or traffic. Those are your candidates for removal. Don't redirect, don't noindex. Just stop vouching for them.

The pattern, again

Same conclusion as everywhere else in technical SEO right now. No new format. No magic tag. Just URLs that meet the basics and content worth defending.

Google was patient with messy sitemaps for over a decade. Sites learned they could submit everything and let Google sort it out. Those days are ending. Site-wide quality is part of core ranking now. Every URL you submit feeds that calculation.

Your sitemap isn't a list of pages that exist. It's a list of pages you're willing to be judged on.

Submit the list you'd defend.

Martin Stepanek

Martin Stepanek

Technical SEO & Web Performance Consultant

With 10+ years building and optimizing websites, I've learned that technical excellence drives business success. I help companies maximize their website's potential through strategic technical SEO and performance improvements that create better experiences for users and stronger results for businesses.

Newsletter

Get Biweekly Technical SEO Insights

Get actionable strategies that help business owners and developers create exceptional user experiences, optimize technical SEO and performance, and drive revenue growth.

    Mersudin ForbesMark Williams-CookAleyda Solis
    Recommended by industry leaders

    No spam, ever. Unsubscribe at any time.

    By subscribing, I agree to the Privacy Policy and Terms and Conditions.

    Get Free Technical SEO & Web Performance Tips

    Follow Me