A disallow directive in your robots.txt file is the fastest way to accidentally remove pages from Google's index. This diagnostic workflow walks you through GSC's robots.txt Tester and live URL testing to find blocked paths, fix them, and verify removal.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.
Two mechanisms tell Google not to index a page: noindex meta tags and Disallow directives in robots.txt. They behave differently. A noindex tag tells Google 'do not show this in search results, but you can crawl it'. A robots.txt disallow tells Google 'do not even crawl this page'. If Google cannot crawl, it never sees the noindex tag. The page stays in the index as a 'blocked by robots.txt' result – with no snippet, no title, and no way to become visible again until the directive is removed and the URL is recrawled. This is the single most common indexing failure we see in SEO audits.
In practice, when you open Google Search Console and see 'Blocked by robots.txt' under Indexing -> Pages, the root cause is almost always a too-broad disallow, a forgotten staging environment directive, or a wildcard that accidentally catches important paths like /blog/ or /products/. The fix is not just removing the line; you must verify that the directive is gone, the URL is crawlable, and Google re-indexes it. That is what this workflow delivers.
Open Google Search Console > Indexing > Pages. Filter by 'Blocked by robots.txt'. Note the URL count and date.
Go to Settings > Crawl stats > robots.txt Tester. Paste your robots.txt content. Test each blocked path.
For each blocked URL, run the Live URL test. Confirm the 'Crawl allowed?' status is 'No'.
Update your robots.txt to remove or narrow the disallow. Upload the new file to the root of your domain.
After the fix, use the Live URL test again. Click 'Request Indexing' once the status shows 'Crawl allowed: Yes'.
Wait 24-72 hours. Re-check the Index report. The 'Blocked' count should drop. Confirm the page appears in SERPs with a snippet.
| Disallow Pattern | What It Blocks | Why It Breaks Indexing | Fix / Risk |
|---|---|---|---|
| Disallow: / | Entire site | Google cannot crawl anything. All pages blocked. Usually a staging or dev environment mistake. | Remove or replace with specific disallows. Risk: accidental deployment to production. |
| Disallow: /wp-admin | WordPress admin area | Often intended to block /wp-admin/ but missing trailing slash. Blocks /wp-admin123 as well. | Use exact path: Disallow: /wp-admin/. Test in GSC Tester. Risk: admin pages still blocked if slash omitted. |
| Disallow: /*.pdf$ | All PDF files across all subdirectories | Google ignores non-standard regex in robots.txt. The dollar sign and asterisk cause the whole rule to be ignored. | Use Disallow: /pdfs/ if you need to block a folder. Better: use noindex for PDFs. Risk: rule silently fails. |
| Disallow: /?s= | Search result pages containing '?s=' | Too broad. Blocks any URL with '?s' in the query string, including legitimate parameter-based pages. | Use Disallow: /search if you want to block search pages. Risk: blocking product filters that use '?s'. |
| Allow: /blog Disallow: / | Only /blog is allowed; everything else is blocked | The Allow directive overrides Disallow only for the exact path. But Google still cannot crawl the homepage or other sections. | Reorder: put Allow before Disallow. Add Allow: / for the whole site, then disallow specific paths. Risk: accidental full blocking if Allow is forgotten. |
The problem: An e-commerce site saw 2,847 URLs blocked by robots.txt in GSC. The blocked URLs were all under /blog/ plus product category pages. The robots.txt had: Disallow: /blog/*?page= and Disallow: /products/*?filter=.
The mistake: The developer intended to block paginated search result parameters but used a wildcard after the folder. In robots.txt, the wildcard * matches any character sequence, including the slash. So /blog/*?page= blocked not just /blog/page/2/ but also /blog/2023/10/post-title.
The fix: Changed to Disallow: /blog/page/ and Disallow: /products/?filter=. Tested both patterns in GSC robots.txt Tester. Then ran Live URL tests on 5 sample blocked URLs. All showed 'Crawl allowed: Yes'. Requested indexing. After 3 days, the blocked count dropped from 2,847 to 0. The blog recovered 30% organic traffic within 10 days.
Duplicate lists: Some CMS plugins append multiple copies of the same disallow rule. Google's crawler respects the first occurrence, but the second may be ignored. Check your robots.txt for repeated lines. Use a tool like the Rich Results Test to validate structured data while you are at it – blocked pages often have broken schema too.
Empty results: You may run robots.txt Tester and see 'No issues detected'. But your GSC Index report still shows blocked URLs. This happens when the robots.txt file is served with an incorrect content-type or is gzipped. Google cannot parse it. Check the HTTP response headers: Content-Type must be text/plain.
Slow vendors: CDN caching can serve an old robots.txt for hours. After making a change, purge the CDN cache and verify the live file via curl or browser. If your hosting provider has a server-level robots.txt (e.g., Apache .htaccess), that overrides your file.
Weak pages: A page blocked by robots.txt may still appear in the index if it has strong external links. Google shows it as 'blocked by robots.txt' but may keep the URL in the index for months. You must remove the disallow AND request indexing to force a recrawl.
Use Google Search Console's URL Inspection tool. Enter the page URL. Look for 'Crawl allowed?' status. If it says 'No', the page is blocked by robots.txt. Then use the robots.txt Tester (Settings > Crawl stats > robots.txt Tester) to test which disallow rule is matching the URL.
Google may have indexed the page before the disallow was added, or the page has strong external links. The index entry will show a 'Blocked by robots.txt' label with no snippet. To remove it, add a noindex tag (which Google cannot see if blocked) or remove the disallow and request indexing via URL Inspection.
Yes. If Google cannot crawl a page due to disallow, but the page returns a 200 HTTP status, GSC may classify it as a soft 404 because the crawler cannot fetch the content. The fix: remove the disallow and ensure the page returns a meaningful response.
It typically takes 1 to 14 days. Google's crawl queue depends on the page's priority, sitemap submission, and your site's overall crawl budget. Use URL Inspection to request indexing immediately after the fix – this can reduce the wait to 24-72 hours.
Disallow prevents crawling. Noindex (via meta tag or HTTP header) prevents indexing. If you Disallow a page, Google cannot see its noindex tag. To remove a page from the index, use noindex AND allow crawling. Robots.txt noindex is not supported by Google. Use the tag or header.
Ensure your staging site is password-protected or uses a different hostname. Do not rely on Disallow: / alone. Google may still crawl if it finds links. Best practice: use HTTP authentication (401) or a firewall IP block. For production, check your robots.txt file for any leftover staging disallows.
Yes, but be careful. Use Disallow: /*?param=value. Test in robots.txt Tester. Over-blocking can remove useful content. If you need to block many parameters, consider using URL parameters tool in GSC to tell Google to ignore certain query strings.
This is usually caused by a plugin or CMS that appends rules without deduplication. Google handles duplicates gracefully (first one wins), but it adds noise. Clean up your robots.txt by removing duplicate lines. Check your .htaccess or server config for additional directives.
Use the URL Inspection tool in GSC. Click 'Test Live URL'. Wait for the result. If 'Crawl allowed?' shows 'Yes', the fix works. Then click 'Request Indexing'. Wait 24 hours and check the page status again.
Wildcards like * can match unintended paths. For example, Disallow: /blog/* can block /blogger/ or /blogging/. Use specific paths like Disallow: /blog/ and avoid wildcards unless you test each one. A mistake can block thousands of pages.
After fixing robots.txt blocking, the next concern is how Google treats newly unblocked pages. A sudden flood of newly indexed pages can trigger algorithmic penalties if your link velocity spikes. For advanced management of indexing pace, see the guide on drip-feed indexing and managing link velocity. This is especially relevant for agencies handling large-scale site migrations or content launches.