[codex] fix Search Console indexing and redirects#136
[codex] fix Search Console indexing and redirects#136
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR aims to improve Search Console indexing by canonicalizing API doc URLs (including trailing slashes), cleaning sitemap/canonical output post-build, and fixing generated API sidebar cleanup so removed alias docs don’t leave stale sidebar items.
Changes:
- Adds a post-build SEO rewrite step to normalize canonicals and filter/normalize sitemap entries.
- Prunes generated
auth-*.api.mdxalias docs and updates sidebar normalization to drop missing doc references. - Updates various docs/config to use versioned, trailing-slash URLs and pins Node to 22.16.0.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| static/_redirects | Adjusts commented redirect examples (no active redirects added in this diff). |
| src/pages/legal/developer-terms.mdx | Updates a legal doc link to an absolute /legal/.../ path. |
| scripts/set-api-displayed-sidebars.js | Adds prelive docs dir and filters generated sidebars to remove missing doc IDs before dedupe. |
| scripts/prune-generated-api-aliases.js | New cleanup script deleting generated auth-*.api.mdx alias docs. |
| scripts/postbuild-seo.js | New postbuild script to rewrite HTML canonicals and rewrite/filter sitemap URLs. |
| package.json | Wires new cleanup + postbuild SEO scripts into gen-all and build. |
| docusaurus.config.js | Enables trailing slashes, ignores certain paths in sitemap, updates navbar links to versioned URLs. |
| docs/**.mdx | Updates internal links to the new versioned canonical doc URLs. |
| .node-version | Pins Node version to 22.16.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Uncomment if you want to enforce no trailing slashes: | ||
|
|
||
| # /\*/ /:splat 301 | ||
| # /*/ /:splat 301 |
There was a problem hiding this comment.
The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.
| # Handle SPA-style fallback for 404s (already handled by Docusaurus static build) | ||
|
|
||
| # /docs/\* /docs/:splat 200 | ||
| # /docs/* /docs/:splat 200 |
There was a problem hiding this comment.
The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.
| # /old-path /new-path 301 | ||
|
|
||
| # /api-v1/\* /api/:splat 301 | ||
| # /api-v1/* /api/:splat 301 |
There was a problem hiding this comment.
The PR description mentions adding targeted legacy redirects for reported 404 URLs, but this diff only changes commented example rules and does not add any active redirect entries. If redirects are intended to ship with this PR, add explicit redirect lines (uncommented) for the known legacy 404 paths so the platform actually serves 301/308s.
| const urlEntries = [...source.matchAll(/<url><loc>([^<]+)<\/loc>[\s\S]*?<\/url>/g)]; | ||
|
|
||
| const filteredEntries = urlEntries | ||
| .map((match) => { | ||
| const [entry, rawUrl] = match; | ||
| const normalizedUrl = normalizeSiteUrl(rawUrl); | ||
| const pathname = new URL(normalizedUrl).pathname; | ||
|
|
||
| if (shouldDropSitemapPath(pathname)) { | ||
| return null; | ||
| } | ||
|
|
||
| return entry.replace(rawUrl, normalizedUrl); | ||
| }) | ||
| .filter(Boolean); |
There was a problem hiding this comment.
The sitemap <url> matching regex requires <url><loc> with no whitespace/newlines between tags. Many sitemap generators (including common Docusaurus outputs) format as <url>\\n <loc>..., which would cause urlEntries to be empty and the script to overwrite sitemap.xml with an empty <urlset> (dropping all URLs). Make the matcher resilient to whitespace (e.g., allow \\s* between tags) or parse the XML with an XML parser to avoid destructive rewrites.
| const siteDir = path.resolve(__dirname, '..'); | ||
| const docsDirs = [ | ||
| 'docs/content_apis_versioned', | ||
| 'docs/user_related_apis_prelive', | ||
| 'docs/user_related_apis_versioned', | ||
| 'docs/oauth2_apis_versioned', | ||
| 'docs/search_apis_versioned', |
There was a problem hiding this comment.
This script now relies on siteDir-relative computations (getDocId() uses path.relative(siteDir, filePath)), but docsDirs are defined as relative strings. That makes correctness depend on the process CWD and can lead to mismatched doc IDs (and over-filtering sidebar items) if the script is run from outside the repo root. Consider normalizing docsDirs to absolute paths (e.g., mapping each through path.join(siteDir, dir)) and keeping the rest of the logic consistently absolute.
…indexing-cleanup-v2
Summary
Verification