Analytics · June 30, 2026 · 8 min read
How to Audit Internal Links at Scale: A Crawl-Based Checklist
Run a crawler-based internal link audit to find broken, redirected, and low-equity links — with a step-by-step checklist and fix priority matrix.
By FluxWriter Team
An internal link audit is one of the highest-leverage activities in technical SEO, yet most teams skip it until a site migration forces their hand. Running a crawler against your own site surfaces a category of problems — broken links, redirect chains, orphaned pages, and thin equity distribution — that neither Google Search Console nor rank trackers will flag for you. This checklist walks through the process step-by-step so you can run the audit in a single session and prioritize fixes the same day.
What a Crawl-Based Audit Actually Finds
Before you configure a crawler, it helps to be clear about what you're looking for. Internal link problems fall into four buckets:
- Broken internal links (4xx) — pages that link to URLs returning a 404 or 410. Google wastes crawl budget here and users hit dead ends.
- Redirected internal links (3xx) — links pointing to a URL that redirects to the canonical destination. The redirect itself costs a small crawl and equity hop; at scale, dozens of these drain PageRank from pages that need it.
- Low-equity pages — pages with very few (or zero) inbound internal links, often called orphans. If Googlebot can only reach a page from your sitemap, it's effectively invisible to the crawl graph.
- Over-linked anchor text — the same keyword phrase used as anchor text across 200+ internal links, which can suppress rather than boost the target page's topical relevance signal.
Pre-Crawl Setup: Getting the Scope Right
Crawlers are only as useful as their configuration. Sloppy scope settings produce noisy exports that waste hours in spreadsheets.
Set Your Crawl Boundaries
- Start URL: use
https://yourdomain.com/(not www vs. non-www inconsistency — pick the canonical version). - Include subfolders: decide whether to include
/blog/,/docs/,/shop/or limit the crawl to a specific section. - Respect robots.txt: keep this on so you're auditing what Googlebot actually sees, not staging paths or admin areas.
- Follow external links: turn this off. You want only internal link data for this audit.
- JavaScript rendering: if your site uses client-side routing (React, Next.js, Vue), enable JS rendering. Otherwise, linked pages rendered client-side will appear as orphans.
Exclude Noise
Add exclusion patterns for:
*/cart,*/checkout,*/account— session-based pages inflate link counts/cdn-cgi/,/_next/static/— framework assets- Faceted URLs with parameters like
?sort=,?page=,?color=unless you specifically need to audit facets
Running the Crawl: Recommended Tools
Three crawlers are well-suited for internal link audits at different scales:
| Tool | Best For | Internal Link Export |
|---|---|---|
| Screaming Frog SEO Spider | Sites up to ~500k URLs (desktop) | Inlinks tab, bulk export |
| Sitebulb | Visual link graph + depth reporting | Hints + raw export |
| Ahrefs Site Audit | Cloud-based, no desktop needed | Internal pages report |
For a mid-sized site (10k–100k URLs), Screaming Frog with the default crawl thread settings (5 threads) will finish in 30–90 minutes depending on server response time.
Once the crawl completes, you'll work from three primary exports:
- All Internal Links — every source URL → destination URL pair, with status code and anchor text
- Response Codes — filtered to 3xx and 4xx only
- Pages by Inlinks — sorted ascending to surface low-equity and orphaned pages
The Audit Checklist
Work through each section in order. The fixes are cumulative — resolving redirected links first makes the orphan analysis cleaner.
Step 1: Resolve All 4xx Internal Links
Filter the Response Codes export to 404 and 410 status codes, then cross-reference with the All Internal Links export to find which pages are doing the linking.
For each broken link, decide:
- Does a replacement page exist? Update the link.
- Was the destination removed intentionally? Remove the link or redirect the destination.
- Is the destination a recently deleted page? Restore it or 301 to the nearest relevant page.
Concrete example: An e-commerce blog linking to /guides/choosing-running-shoes returns 404 after a URL restructure. The canonical URL is now /guides/running-shoe-buying-guide. Fix the source link directly — don't add a redirect and leave the broken link in place. Redirects are a band-aid; fixing the source link is the clean solution.
Step 2: Clean Up Redirected Internal Links
Filter to 3xx status codes. Export the full list and group by redirect destination to find which final URLs are receiving the most equity loss.
A redirect chain — where A → B → C — costs two hops. Any internal link pointing to A should point directly to C.
Prioritize redirects on:
- Pages in your top 20% of organic traffic
- Pages you're actively targeting for competitive keywords
- High-PageRank hub pages (category pages, pillar content)
Step 3: Identify Orphaned and Near-Orphaned Pages
Sort the Pages by Inlinks export ascending. Pages with 0 inlinks are orphans; pages with 1–2 inlinks are near-orphans worth reviewing.
For each orphan:
- Is it in your sitemap? If yes, it's technically crawlable but has no link equity. Add at least two contextually relevant internal links from pages with solid link equity.
- Is it excluded from your sitemap and has no inlinks? Decide whether to index it, noindex it, or redirect and consolidate.
A useful heuristic: any page you want to rank should have at least 3 internal links from pages that themselves have 10+ inlinks. Pages with fewer than that are essentially unsponsored by your own site.
Step 4: Audit Anchor Text Distribution
In the All Internal Links export, filter by destination URL for your top 10 target pages. Review the anchor text column for:
- Diversity: is the same exact-match phrase used as anchor text in 80%+ of links? Over-optimization here is a real risk.
- Relevance: are any internal links using generic anchors ("click here", "read more", "learn more") when a descriptive phrase would serve the user better?
- Missing opportunities: does the content surrounding mentions of a related topic link to the relevant target page, or does it leave the phrase unlinked?
Step 5: Check Link Depth for Critical Pages
Crawl depth is how many clicks from the homepage a page requires. Pages buried at depth 5+ are crawled less frequently.
Export click depth from your crawler and flag any page that:
- Receives organic traffic but sits at depth 4 or higher
- Is part of a current ranking push but is only reachable through a single deep chain
Shorten depth by adding links from category pages, your blog index, or a relevant hub article.
Turning Findings Into a Prioritized Fix List
Not all findings are equal. Use a simple scoring matrix:
| Issue Type | Impact | Effort | Priority |
|---|---|---|---|
| Broken internal links (4xx) | High | Low–Medium | P1 |
| Redirect chains (3+ hops) | Medium | Low | P1 |
| Single-hop redirects on key pages | Medium | Low | P2 |
| Orphaned pages with existing traffic | High | Medium | P1 |
| Orphaned pages with zero traffic | Low | Low | P3 |
| Generic anchor text on high-value links | Medium | Low | P2 |
Assign each finding a ticket, batch fixes by template or page type, and schedule a re-crawl 2–4 weeks after the fixes are deployed to verify resolution.
FAQ
How often should I run an internal link audit? For sites publishing more than 20 articles per month or running frequent promotions that create and delete pages, a quarterly crawl is the minimum. Larger e-commerce catalogs with high page churn benefit from monthly automated audits using a cloud crawler.
Does fixing redirected internal links actually improve rankings? The evidence is directional rather than definitive. Removing redirect hops ensures the full link equity value passes to the destination without loss at intermediate steps. The ranking impact is most visible on pages that are close to a competitive threshold — already on page 2 — where marginal equity gains are more likely to push movement.
What's the difference between an internal link audit and a site crawl? A site crawl is the data collection step; an internal link audit is the analytical process applied to that data. A crawl tells you what exists. The audit is the structured review of link source, destination, status, anchor text, and depth to identify and fix structural problems. You need a crawl to do the audit, but running a crawl without a defined audit framework produces a spreadsheet without a clear action plan.
Practical Takeaway
Run the crawl, export the three core reports, and work the checklist in order: broken links first, then redirects, then orphans, then anchor text. The whole process takes 4–6 hours for a site under 50k pages. Re-crawl after fixes are deployed and track change over time — a clean internal link graph compounds, because every new page you publish inherits the equity structure you've already built.
If you're also building out content to support your link equity targets, tools like FluxWriter can help you map new articles to the gaps your internal link audit surfaces, so the content you publish already knows where it fits in your site architecture.