Skip to content

feat: add GitHub as a first-class data source#136

Open
zerone0x wants to merge 1 commit intomvanhorn:mainfrom
zerone0x:feat/github-source
Open

feat: add GitHub as a first-class data source#136
zerone0x wants to merge 1 commit intomvanhorn:mainfrom
zerone0x:feat/github-source

Conversation

@zerone0x
Copy link
Copy Markdown

Summary

Closes #134. Adds GitHub Issues and PRs as a structured search source using the GitHub Search API, following the existing 7-layer source pattern.

  • Source module (github.py): search via api.github.com/search/issues, parse responses, enrich top items with comment threads
  • Schema: GitHubItem dataclass with engagement metrics, labels, body snippets, and comment insights
  • Normalize/Score/Dedupe: full pipeline — engagement formula weights comments (0.55) over reactions (0.45) since active discussion is a stronger signal on GitHub
  • Orchestration: parallel execution via ThreadPoolExecutor with per-depth timeouts (quick: 30s, default: 60s, deep: 90s)
  • Render: compact view, status line, context snippets, and full markdown output sections
  • Config: optional GITHUB_TOKEN env var for higher rate limits (30 req/min vs 10 unauthenticated); always available as a source
  • Query tiering: added as tier2 for product, concept, how_to, comparison query types

Design decisions

  • Modeled after the HackerNews source (closest analog: free API, similar engagement signals)
  • Progressive unlock: works without auth, better with optional token
  • Comments weighted higher than reactions in engagement scoring because active discussion threads are a stronger quality signal on GitHub than passive thumbs-up

Test plan

  • 17 new unit tests covering parsing, normalization, scoring, sorting, and serialization (tests/test_github.py)
  • Full test suite passes (900/900 — 3 pre-existing failures unrelated to this PR)
  • Manual test with --search github flag
  • Manual test with GITHUB_TOKEN set for authenticated rate limits

🤖 Generated with Claude Code

Add GitHub Issues and PRs as a structured search source using the GitHub
Search API. Works without auth (10 req/min) with optional GITHUB_TOKEN
for higher rate limits (30 req/min).

Implements the full 7-layer source pattern:
- Source module (github.py): search, parse, comment enrichment
- Schema: GitHubItem dataclass with engagement, labels, comments
- Normalize: normalize_github_items() with date confidence
- Score: engagement formula (0.45*reactions + 0.55*comments)
- Dedupe: dedupe_github() wrapper
- Orchestration: parallel execution with configurable timeouts
- Render: compact, status, context, and full markdown output
- Config: GITHUB_TOKEN env var, source tiering (tier2 for product/
  concept/how_to/comparison queries)

Includes 17 unit tests covering parsing, normalization, scoring,
sorting, and serialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant