Skip to content

[codex] fix Search Console indexing and redirects#136

Closed
basit3407 wants to merge 10 commits intomainfrom
codex/search-console-indexing-cleanup-v2
Closed

[codex] fix Search Console indexing and redirects#136
basit3407 wants to merge 10 commits intomainfrom
codex/search-console-indexing-cleanup-v2

Conversation

@basit3407
Copy link
Copy Markdown
Collaborator

Summary

  • canonicalize API docs to versioned URLs and clean sitemap/canonical output for Search Console
  • add targeted legacy redirects for the reported 404 URLs and keep Cloudflare Pages on Node 22.16.0
  • fix generated API cleanup order so deleted auth alias docs do not leave stale sidebar entries

Verification

  • yarn build
  • built locally on Node 22.16.0 from a fresh branch off origin/main
@basit3407 basit3407 requested a review from Copilot April 21, 2026 05:17
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Deploying qf-api-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: e86b410
Status:🚫  Build failed.

View logs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR aims to improve Search Console indexing by canonicalizing API doc URLs (including trailing slashes), cleaning sitemap/canonical output post-build, and fixing generated API sidebar cleanup so removed alias docs don’t leave stale sidebar items.

Changes:

  • Adds a post-build SEO rewrite step to normalize canonicals and filter/normalize sitemap entries.
  • Prunes generated auth-*.api.mdx alias docs and updates sidebar normalization to drop missing doc references.
  • Updates various docs/config to use versioned, trailing-slash URLs and pins Node to 22.16.0.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
static/_redirects Adjusts commented redirect examples (no active redirects added in this diff).
src/pages/legal/developer-terms.mdx Updates a legal doc link to an absolute /legal/.../ path.
scripts/set-api-displayed-sidebars.js Adds prelive docs dir and filters generated sidebars to remove missing doc IDs before dedupe.
scripts/prune-generated-api-aliases.js New cleanup script deleting generated auth-*.api.mdx alias docs.
scripts/postbuild-seo.js New postbuild script to rewrite HTML canonicals and rewrite/filter sitemap URLs.
package.json Wires new cleanup + postbuild SEO scripts into gen-all and build.
docusaurus.config.js Enables trailing slashes, ignores certain paths in sitemap, updates navbar links to versioned URLs.
docs/**.mdx Updates internal links to the new versioned canonical doc URLs.
.node-version Pins Node version to 22.16.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread static/_redirects Outdated
# Uncomment if you want to enforce no trailing slashes:

# /\*/ /:splat 301
# /*/ /:splat 301
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.

Copilot uses AI. Check for mistakes.
Comment thread static/_redirects Outdated
# Handle SPA-style fallback for 404s (already handled by Docusaurus static build)

# /docs/\* /docs/:splat 200
# /docs/* /docs/:splat 200
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.

Copilot uses AI. Check for mistakes.
Comment thread static/_redirects Outdated
# /old-path /new-path 301

# /api-v1/\* /api/:splat 301
# /api-v1/* /api/:splat 301
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.

Copilot uses AI. Check for mistakes.
Comment thread scripts/postbuild-seo.js Outdated
Comment on lines +145 to +159
const urlEntries = [...source.matchAll(/<url><loc>([^<]+)<\/loc>[\s\S]*?<\/url>/g)];

const filteredEntries = urlEntries
.map((match) => {
const [entry, rawUrl] = match;
const normalizedUrl = normalizeSiteUrl(rawUrl);
const pathname = new URL(normalizedUrl).pathname;

if (shouldDropSitemapPath(pathname)) {
return null;
}

return entry.replace(rawUrl, normalizedUrl);
})
.filter(Boolean);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sitemap <url> matching regex requires <url><loc> with no whitespace/newlines between tags. Many sitemap generators (including common Docusaurus outputs) format as <url>\\n <loc>..., which would cause urlEntries to be empty and the script to overwrite sitemap.xml with an empty <urlset> (dropping all URLs). Make the matcher resilient to whitespace (e.g., allow \\s* between tags) or parse the XML with an XML parser to avoid destructive rewrites.

Copilot uses AI. Check for mistakes.
Comment on lines 6 to 12
const siteDir = path.resolve(__dirname, '..');
const docsDirs = [
'docs/content_apis_versioned',
'docs/user_related_apis_prelive',
'docs/user_related_apis_versioned',
'docs/oauth2_apis_versioned',
'docs/search_apis_versioned',
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script now relies on siteDir-relative computations (getDocId() uses path.relative(siteDir, filePath)), but docsDirs are defined as relative strings. That makes correctness depend on the process CWD and can lead to mismatched doc IDs (and over-filtering sidebar items) if the script is run from outside the repo root. Consider normalizing docsDirs to absolute paths (e.g., mapping each through path.join(siteDir, dir)) and keeping the rest of the logic consistently absolute.

Copilot uses AI. Check for mistakes.
@basit3407
Copy link
Copy Markdown
Collaborator Author

Replaced by #141, which recreates the intended #131 Search Console cleanup scope cleanly on top of current main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants