Skip to content

[codex] add web bot auth and AI crawler policies#132

Merged
basit3407 merged 4 commits intomainfrom
codex/web-bot-auth-ai-policies
Apr 20, 2026
Merged

[codex] add web bot auth and AI crawler policies#132
basit3407 merged 4 commits intomainfrom
codex/web-bot-auth-ai-policies

Conversation

@basit3407
Copy link
Copy Markdown
Collaborator

What changed

  • added a JWKS at /.well-known/http-message-signatures-directory so the site can publish a Web Bot Auth verification directory
  • added a Cloudflare Pages header rule to serve that endpoint as application/http-message-signatures-directory+json
  • expanded robots.txt with explicit AI crawler rules and Content-Signal directives for training, search, and AI input preferences

Why

The site was missing the bot identity and crawler policy signals expected by agent-readiness checks:

  • no public Web Bot Auth directory
  • no explicit AI crawler blocks for bots such as GPTBot, OAI-SearchBot, Claude-Web, and Google-Extended
  • no Content-Signal declarations in robots.txt

Impact

  • receiving sites can discover the site's Web Bot Auth public key directory
  • AI crawlers now receive explicit per-agent crawl instructions instead of only the wildcard policy
  • training is opted out while search and live AI input remain allowed by declared policy

Validation

  • ran git diff --cached --check
  • parsed and validated the JWKS shape with Node
  • did not run a full Docusaurus production build here because this repo's build path regenerates large API docs and exceeded the CLI timeout during verification
@basit3407 basit3407 marked this pull request as ready for review April 20, 2026 06:42
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 20, 2026

Deploying qf-api-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9653bea
Status: ✅  Deploy successful!
Preview URL: https://18ede448.qf-api-docs.pages.dev
Branch Preview URL: https://codex-web-bot-auth-ai-polici.qf-api-docs.pages.dev

View logs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds agent-readiness signals for automated crawlers by publishing a Web Bot Auth verification directory endpoint, ensuring it’s served with the correct content-type, and expanding robots.txt with explicit AI crawler policies.

Changes:

  • Add a Web Bot Auth-style key directory at /.well-known/http-message-signatures-directory.
  • Configure Cloudflare Pages headers to serve the directory with application/http-message-signatures-directory+json.
  • Expand robots.txt with per-user-agent AI crawler rules and Content-Signal directives.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
static/robots.txt Adds explicit crawler groups (allow/deny) and Content-Signal declarations.
static/_headers Adds a Pages header rule to serve the well-known directory with a specific JSON media type.
static/.well-known/http-message-signatures-directory Publishes a JSON key directory (JWK-like) for bot auth discovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread static/robots.txt
Comment thread static/robots.txt Outdated
Comment thread static/robots.txt Outdated
@basit3407 basit3407 merged commit 74998fd into main Apr 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants