Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"mcpServers": {
"buddhist-uni": {
"command": "/Users/lsh/projects/divers/buddhist-uni.github.io/search/.venv/bin/python",
"args": ["-m", "search.server.mcp_server"],
"env": {
"PYTHONPATH": "/Users/lsh/projects/divers/buddhist-uni.github.io"
}
}
}
}
165 changes: 165 additions & 0 deletions search/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Buddhist University — Moteur de recherche vectoriel + MCP Server

Recherche sémantique dans **4 494 ressources bouddhistes** (textes canoniques, articles académiques, AV, cours) via une base vectorielle [Qdrant](https://qdrant.tech/) et le modèle d'embeddings `all-MiniLM-L6-v2`.

Expose les résultats via une **API REST FastAPI** et un **serveur MCP** branché directement sur Claude.

---

## Architecture

```
_content/ ← 4494 fichiers markdown (source de vérité Jekyll)
search/
├── ingestion/ ← Pipeline: extraction → embeddings → Qdrant
│ ├── extract.py réutilise website.py + frontmatter
│ ├── embedder.py sentence-transformers/all-MiniLM-L6-v2
│ └── ingest.py pipeline principal (batch 100, ~25s)
├── api/ ← FastAPI REST (port 8001)
│ ├── main.py app + CORS + routes
│ ├── search.py GET /search, GET /reading-path
│ ├── courses.py GET /courses, GET /courses/{id}, GET /teachers/{slug}
│ └── models.py Pydantic models
├── server/ ← MCP Server (5 tools pour Claude)
│ ├── mcp_server.py FastMCP + déclaration des tools
│ └── tools.py fonctions Qdrant partagées API + MCP
├── tests/ ← 56 tests (unit + integration)
│ ├── test_search.py
│ ├── test_api.py
│ └── pytest.ini
├── docker-compose.yml Qdrant
├── requirements.txt
└── README.md
.mcp.json ← Config MCP pour Claude Code (racine du projet)
```

---

## Setup (première fois)

### 1. Créer l'environnement Python

```bash
cd buddhist-uni.github.io
uv venv search/.venv --python 3.12
uv pip install --python search/.venv/bin/python -r search/requirements.txt
```

### 2. Démarrer Qdrant

```bash
cd search && docker-compose up -d
# Vérifier : curl localhost:6333/healthz
# Dashboard : http://localhost:6333/dashboard
```

### 3. Indexer les 4494 documents (~25 secondes)

```bash
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest
```

Options :
```bash
# Test sur 100 fichiers
--limit 100

# Réindexer entièrement
--recreate

# Requête de test après ingestion
--test-query "impermanence nibbana"
```

---

## Utilisation

### API REST (FastAPI)

```bash
PYTHONPATH=$(pwd) search/.venv/bin/uvicorn search.api.main:app --port 8001 --reload
```

**Docs interactives** → http://localhost:8001/docs

| Endpoint | Exemple |
|---|---|
| `GET /search` | `/search?q=meditation+breath&limit=8` |
| `GET /search` (filtres) | `/search?q=nibbana&category=canon&min_stars=4` |
| `GET /search` (tags) | `/search?q=compassion&tags=metta&tags=karuna` |
| `GET /reading-path` | `/reading-path?topic=anatta&level=beginner` |
| `GET /courses` | `/courses` |
| `GET /courses/{id}` | `/courses/mn` · `/courses/pali-primer` |
| `GET /teachers/{slug}` | `/teachers/bodhi` · `/teachers/thanissaro` |
| `GET /health` | État de Qdrant |

**Paramètres `/search`** :
- `q` — requête en langage naturel (obligatoire)
- `category` — `articles` `canon` `av` `booklets` `essays` `monographs` `papers` `excerpts` `reference`
- `tags` — tags multiples (ex: `&tags=metta&tags=meditation`)
- `course` — slug de cours (ex: `mn`, `abhidhamma`)
- `min_stars` — qualité minimale 1–5
- `limit` — 1–20 (défaut 8)

**Paramètres `/reading-path`** :
- `topic` — sujet à explorer
- `level` — `beginner` · `intermediate` · `advanced`
- `limit` — 1–20 (défaut 10)

### MCP Server (Claude Code)

Le fichier `.mcp.json` est déjà configuré à la racine du projet.
**Ouvre un nouveau Claude Code dans ce dossier** — les 5 tools sont disponibles automatiquement.

| Tool MCP | Description |
|---|---|
| `search_dharma(query, tags?, category?, limit?)` | Recherche sémantique |
| `get_course(course_id)` | Curriculum complet d'un cours |
| `list_courses()` | Liste des 16 cours structurés |
| `find_by_teacher(teacher_slug)` | Ressources par enseignant |
| `get_reading_path(topic, level, limit)` | Parcours de lecture guidé |

**Cours disponibles** : `an` `buddha` `buddhism` `chinese-primer` `ebts` `ethics` `form` `function` `imagery` `mn` `nibbana` `nibbana-mind-stilled` `pali-new-course` `pali-primer` `philosophy` `tranquility-and-insight`

**Enseignants** (exemples) : `bodhi` `thanissaro` `ajahn-chah` `ajahn-brahm` `analayo` `sujato` `nanavira`

---

## Tests

```bash
# Tests unitaires (pas besoin de Qdrant)
PYTHONPATH=$(pwd) search/.venv/bin/pytest search/tests/ -m "not integration" -v

# Tests complets (Qdrant requis)
PYTHONPATH=$(pwd) search/.venv/bin/pytest search/tests/ -v

# Résultat attendu : 56 passed
```

---

## Réindexation

Si tu ajoutes du nouveau contenu dans `_content/` :

```bash
# Réindexation incrémentale (upsert, safe)
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest

# Réindexation complète (repart de zéro)
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest --recreate
```

---

## Stack technique

| Composant | Technologie | Raison |
|---|---|---|
| Vector DB | **Qdrant** (Docker) | Filtres metadata natifs, HNSW, open-source |
| Embeddings | **all-MiniLM-L6-v2** | 384 dims, local, rapide (~500 docs/s), gratuit |
| API | **FastAPI** + uvicorn | Async, autodoc OpenAPI, Pydantic v2 |
| MCP | **FastMCP** (Anthropic SDK) | Stdio transport, compatible Claude Code |
| Extraction | **python-frontmatter** | Réutilise le pipeline Jekyll existant |
Empty file added search/__init__.py
Empty file.
Empty file added search/api/__init__.py
Empty file.
39 changes: 39 additions & 0 deletions search/api/courses.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Courses endpoints."""

from fastapi import APIRouter, HTTPException
from search.api.models import CourseDetail, CourseSummary, SearchResult
from search.server.tools import get_course as _get_course, list_courses as _list_courses, find_by_teacher as _find_by_teacher

router = APIRouter()


@router.get("/courses", response_model=list[CourseSummary], summary="Liste des cours")
def list_courses():
"""Retourne les 16+ cours structurés disponibles."""
return [CourseSummary(**c) for c in _list_courses()]


@router.get("/courses/{course_id}", response_model=CourseDetail, summary="Détail d'un cours")
def get_course(course_id: str):
"""
Retourne le curriculum complet d'un cours avec toutes ses ressources.

Exemples : `mn`, `dn`, `sn`, `an`, `abhidhamma`, `meditation`, `pali-primer`, `bn4`
"""
result = _get_course(course_id)
if result is None:
raise HTTPException(status_code=404, detail=f"Cours '{course_id}' introuvable")
return CourseDetail(**result)


@router.get("/teachers/{teacher_slug}", response_model=list[SearchResult], summary="Contenu par enseignant")
def get_teacher(teacher_slug: str, limit: int = 20):
"""
Retourne tout le contenu d'un enseignant/auteur.

Exemples : `bodhi`, `thanissaro`, `ajahn-chah`, `ajahn-brahm`, `analayo`, `sujato`
"""
raw = _find_by_teacher(teacher_slug, limit=limit)
if not raw:
raise HTTPException(status_code=404, detail=f"Enseignant '{teacher_slug}' introuvable ou sans contenu indexé")
return [SearchResult(score=1.0, url=r.get("url", ""), **{k: v for k, v in r.items() if k != "url"}) for r in raw]
65 changes: 65 additions & 0 deletions search/api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
Buddhist University Search API — FastAPI app.

Usage:
PYTHONPATH=/path/to/buddhist-uni.github.io \\
uvicorn search.api.main:app --port 8001 --reload

Docs:
http://localhost:8001/docs
"""

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from search.api.search import router as search_router
from search.api.courses import router as courses_router

app = FastAPI(
title="Buddhist University Search API",
description=(
"Recherche sémantique dans 4494+ ressources bouddhistes — "
"textes canoniques, articles académiques, AV, cours structurés."
),
version="1.0.0",
docs_url="/docs",
redoc_url="/redoc",
)

app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["GET"],
allow_headers=["*"],
)

app.include_router(search_router, tags=["Search"])
app.include_router(courses_router, tags=["Courses"])


@app.get("/", tags=["Health"])
def root():
return {
"name": "Buddhist University Search API",
"version": "1.0.0",
"docs": "/docs",
"endpoints": {
"search": "GET /search?q=...&tags=...&category=...&limit=8",
"reading_path": "GET /reading-path?topic=...&level=beginner",
"courses": "GET /courses",
"course_detail": "GET /courses/{id}",
"teacher": "GET /teachers/{slug}",
},
}


@app.get("/health", tags=["Health"])
def health():
from search.ingestion.qdrant_setup import get_client, COLLECTION_NAME
client = get_client()
info = client.get_collection(COLLECTION_NAME)
return {
"status": "ok",
"indexed_documents": info.points_count,
"collection": COLLECTION_NAME,
}
69 changes: 69 additions & 0 deletions search/api/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
"""Pydantic models for the Buddhist University Search API."""

from pydantic import BaseModel, Field


class SearchResult(BaseModel):
score: float
title: str
category: str
tags: list[str] = []
authors: list[str] = []
course: str | None = None
year: int | None = None
stars: int | None = None
url: str
external_url: str | None = None
minutes: int | None = None
pages: str | None = None


class SearchResponse(BaseModel):
query: str
total: int
results: list[SearchResult]


class CourseItem(BaseModel):
title: str
category: str
tags: list[str] = []
authors: list[str] = []
year: int | None = None
stars: int | None = None
url: str
minutes: int | None = None
pages: str | None = None


class CourseDetail(BaseModel):
id: str
title: str
subtitle: str = ""
description: str = ""
icon: str = ""
next_courses: list[str] = []
content_count: int
content: list[CourseItem]


class CourseSummary(BaseModel):
id: str
title: str
subtitle: str = ""
icon: str = ""
next_courses: list[str] = []


class ReadingPathItem(BaseModel):
path_order: int
level: str
score: float
title: str
category: str
tags: list[str] = []
authors: list[str] = []
stars: int | None = None
url: str
minutes: int | None = None
pages: str | None = None
Loading