The crawl budget myth (and what actually matters)
Most sites will never hit their crawl budget. If your pages aren't indexed, the problem is almost certainly something else — and the real diagnosis is simpler than the SEO Twitter discourse suggests.
Every month I get at least one audit request that opens with some variation of: "We think our crawl budget is maxed out." Every month I look at the logs and find something different. Sometimes it's duplicate content collapsing into the same canonical. Sometimes it's a robots.txt rule nobody remembered. Once it was a developer who'd accidentally shipped noindex on every product page.
It's almost never crawl budget.
What crawl budget actually is
Google has two concepts that get mashed together: crawl rate limit (how fast Googlebot can hit your server without breaking it) and crawl demand (how much Google wants to crawl you). The combination is informally called crawl budget.
The catch: Google explicitly says this only matters for sites with millions of pages, or sites that publish thousands of new URLs a day. For the 99% of sites I work with — SaaS properties with 2,000 to 50,000 URLs — crawl budget is a non-issue.
"If a website has fewer than a few thousand URLs, most of the time it will be crawled efficiently." — Google Search Central docs, paraphrased
What's actually gating your indexation
1. Duplicate & near-duplicate content
Faceted navigation generating 40,000 URL variants of the same page. Parameter-laden URLs. Session IDs. UTM-tagged pages being indexed. Google crawls them, realizes they're duplicates, and stops bothering. Your "crawl budget problem" is a canonicalization problem.
2. Thin or low-quality content
Google's quality threshold has climbed every year since Panda. Pages that used to index easily in 2018 now sit in "Discovered — currently not indexed" for months. The fix isn't more crawl budget. It's fewer, better pages.
3. Internal linking
If a page is four clicks deep, buried in pagination, with no contextual internal links pointing to it — Google correctly infers it's not important. It will deprioritize crawling it, and often won't index it at all.
4. Render issues
Client-side rendered content that Googlebot can't parse in the initial render pass. Lazy-loaded copy that needs a scroll event. Next.js sites missing proper SSR on dynamic routes. These fail the crawl before indexation even becomes a question.
1. Check Search Console for "Discovered — not indexed" and "Crawled — not indexed."
2. Sample 20 URLs from each bucket and inspect them manually.
3. Look at log files for crawl patterns — which URL clusters get hit, which don't.
4. Only after all that, consider whether budget is actually the constraint.
The exception: programmatic and enterprise sites
If you run a marketplace with 2M product pages, or a publisher with a 500k-URL archive, crawl budget is real. The fixes look different: aggressive Last-Modified headers, disciplined sitemap.xml partitioning, strict parameter handling in Search Console, and sometimes deliberate noindex on low-value URL clusters to concentrate crawl equity.
But even at scale, "crawl budget" is rarely the root cause. It's the symptom of bloated URL space that should have been pruned three architecture decisions ago.
What to do Monday morning
- Open Search Console → Pages → "Why pages aren't indexed."
- If "Discovered — not indexed" is your biggest bucket, investigate internal linking and content quality.
- If "Crawled — not indexed" dominates, it's almost always a quality signal. Audit those pages.
- If "Duplicate without user-selected canonical" is anywhere near the top, fix your canonicals before anything else.
- Ignore the crawl budget chatter unless you're north of 1M URLs.
Crawl budget is a compelling narrative because it's a technical-sounding explanation for a frustrating problem. But technical SEO is boring the same way accounting is boring — most of the answers are in the basics, and the basics are unsexy.
Fix the canonicals. Cull the thin pages. Make sure your important URLs are two clicks from the homepage. You'll be surprised how fast indexation improves.