Skip to content

Commit fdfb6c9

Browse files
Lsh0xclaude
andcommitted
test(search): 56 tests unitaires + intégration + README
- test_search.py: embedder (singleton, norme L2, batch), extract (payload, catégories, stars), Qdrant (filtre cat/tags, ordre scores, pertinence), tools MCP (courses, teacher, reading path tous niveaux) - test_api.py: tous endpoints FastAPI (search, courses, teachers, reading-path) validation params (limit max, level pattern, min_length query) - pytest.ini: mark integration enregistré - README.md: setup complet, endpoints, tools MCP, réindexation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 062e73f commit fdfb6c9

4 files changed

Lines changed: 606 additions & 0 deletions

File tree

‎search/README.md‎

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Buddhist University — Moteur de recherche vectoriel + MCP Server
2+
3+
Recherche sémantique dans **4 494 ressources bouddhistes** (textes canoniques, articles académiques, AV, cours) via une base vectorielle [Qdrant](https://qdrant.tech/) et le modèle d'embeddings `all-MiniLM-L6-v2`.
4+
5+
Expose les résultats via une **API REST FastAPI** et un **serveur MCP** branché directement sur Claude.
6+
7+
---
8+
9+
## Architecture
10+
11+
```
12+
_content/ ← 4494 fichiers markdown (source de vérité Jekyll)
13+
search/
14+
├── ingestion/ ← Pipeline: extraction → embeddings → Qdrant
15+
│ ├── extract.py réutilise website.py + frontmatter
16+
│ ├── embedder.py sentence-transformers/all-MiniLM-L6-v2
17+
│ └── ingest.py pipeline principal (batch 100, ~25s)
18+
├── api/ ← FastAPI REST (port 8001)
19+
│ ├── main.py app + CORS + routes
20+
│ ├── search.py GET /search, GET /reading-path
21+
│ ├── courses.py GET /courses, GET /courses/{id}, GET /teachers/{slug}
22+
│ └── models.py Pydantic models
23+
├── server/ ← MCP Server (5 tools pour Claude)
24+
│ ├── mcp_server.py FastMCP + déclaration des tools
25+
│ └── tools.py fonctions Qdrant partagées API + MCP
26+
├── tests/ ← 56 tests (unit + integration)
27+
│ ├── test_search.py
28+
│ ├── test_api.py
29+
│ └── pytest.ini
30+
├── docker-compose.yml Qdrant
31+
├── requirements.txt
32+
└── README.md
33+
.mcp.json ← Config MCP pour Claude Code (racine du projet)
34+
```
35+
36+
---
37+
38+
## Setup (première fois)
39+
40+
### 1. Créer l'environnement Python
41+
42+
```bash
43+
cd buddhist-uni.github.io
44+
uv venv search/.venv --python 3.12
45+
uv pip install --python search/.venv/bin/python -r search/requirements.txt
46+
```
47+
48+
### 2. Démarrer Qdrant
49+
50+
```bash
51+
cd search && docker-compose up -d
52+
# Vérifier : curl localhost:6333/healthz
53+
# Dashboard : http://localhost:6333/dashboard
54+
```
55+
56+
### 3. Indexer les 4494 documents (~25 secondes)
57+
58+
```bash
59+
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest
60+
```
61+
62+
Options :
63+
```bash
64+
# Test sur 100 fichiers
65+
--limit 100
66+
67+
# Réindexer entièrement
68+
--recreate
69+
70+
# Requête de test après ingestion
71+
--test-query "impermanence nibbana"
72+
```
73+
74+
---
75+
76+
## Utilisation
77+
78+
### API REST (FastAPI)
79+
80+
```bash
81+
PYTHONPATH=$(pwd) search/.venv/bin/uvicorn search.api.main:app --port 8001 --reload
82+
```
83+
84+
**Docs interactives**http://localhost:8001/docs
85+
86+
| Endpoint | Exemple |
87+
|---|---|
88+
| `GET /search` | `/search?q=meditation+breath&limit=8` |
89+
| `GET /search` (filtres) | `/search?q=nibbana&category=canon&min_stars=4` |
90+
| `GET /search` (tags) | `/search?q=compassion&tags=metta&tags=karuna` |
91+
| `GET /reading-path` | `/reading-path?topic=anatta&level=beginner` |
92+
| `GET /courses` | `/courses` |
93+
| `GET /courses/{id}` | `/courses/mn` · `/courses/pali-primer` |
94+
| `GET /teachers/{slug}` | `/teachers/bodhi` · `/teachers/thanissaro` |
95+
| `GET /health` | État de Qdrant |
96+
97+
**Paramètres `/search`** :
98+
- `q` — requête en langage naturel (obligatoire)
99+
- `category``articles` `canon` `av` `booklets` `essays` `monographs` `papers` `excerpts` `reference`
100+
- `tags` — tags multiples (ex: `&tags=metta&tags=meditation`)
101+
- `course` — slug de cours (ex: `mn`, `abhidhamma`)
102+
- `min_stars` — qualité minimale 1–5
103+
- `limit` — 1–20 (défaut 8)
104+
105+
**Paramètres `/reading-path`** :
106+
- `topic` — sujet à explorer
107+
- `level``beginner` · `intermediate` · `advanced`
108+
- `limit` — 1–20 (défaut 10)
109+
110+
### MCP Server (Claude Code)
111+
112+
Le fichier `.mcp.json` est déjà configuré à la racine du projet.
113+
**Ouvre un nouveau Claude Code dans ce dossier** — les 5 tools sont disponibles automatiquement.
114+
115+
| Tool MCP | Description |
116+
|---|---|
117+
| `search_dharma(query, tags?, category?, limit?)` | Recherche sémantique |
118+
| `get_course(course_id)` | Curriculum complet d'un cours |
119+
| `list_courses()` | Liste des 16 cours structurés |
120+
| `find_by_teacher(teacher_slug)` | Ressources par enseignant |
121+
| `get_reading_path(topic, level, limit)` | Parcours de lecture guidé |
122+
123+
**Cours disponibles** : `an` `buddha` `buddhism` `chinese-primer` `ebts` `ethics` `form` `function` `imagery` `mn` `nibbana` `nibbana-mind-stilled` `pali-new-course` `pali-primer` `philosophy` `tranquility-and-insight`
124+
125+
**Enseignants** (exemples) : `bodhi` `thanissaro` `ajahn-chah` `ajahn-brahm` `analayo` `sujato` `nanavira`
126+
127+
---
128+
129+
## Tests
130+
131+
```bash
132+
# Tests unitaires (pas besoin de Qdrant)
133+
PYTHONPATH=$(pwd) search/.venv/bin/pytest search/tests/ -m "not integration" -v
134+
135+
# Tests complets (Qdrant requis)
136+
PYTHONPATH=$(pwd) search/.venv/bin/pytest search/tests/ -v
137+
138+
# Résultat attendu : 56 passed
139+
```
140+
141+
---
142+
143+
## Réindexation
144+
145+
Si tu ajoutes du nouveau contenu dans `_content/` :
146+
147+
```bash
148+
# Réindexation incrémentale (upsert, safe)
149+
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest
150+
151+
# Réindexation complète (repart de zéro)
152+
PYTHONPATH=$(pwd) search/.venv/bin/python -m search.ingestion.ingest --recreate
153+
```
154+
155+
---
156+
157+
## Stack technique
158+
159+
| Composant | Technologie | Raison |
160+
|---|---|---|
161+
| Vector DB | **Qdrant** (Docker) | Filtres metadata natifs, HNSW, open-source |
162+
| Embeddings | **all-MiniLM-L6-v2** | 384 dims, local, rapide (~500 docs/s), gratuit |
163+
| API | **FastAPI** + uvicorn | Async, autodoc OpenAPI, Pydantic v2 |
164+
| MCP | **FastMCP** (Anthropic SDK) | Stdio transport, compatible Claude Code |
165+
| Extraction | **python-frontmatter** | Réutilise le pipeline Jekyll existant |

‎search/pytest.ini‎

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[pytest]
2+
markers =
3+
integration: tests requiring Qdrant running with indexed collection

‎search/tests/test_api.py‎

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
"""
2+
Tests d'intégration FastAPI — tous les endpoints.
3+
4+
Usage:
5+
PYTHONPATH=/path/to/buddhist-uni.github.io pytest search/tests/test_api.py -v
6+
"""
7+
8+
import pytest
9+
from fastapi.testclient import TestClient
10+
from search.api.main import app
11+
12+
client = TestClient(app)
13+
14+
15+
class TestHealth:
16+
def test_root(self):
17+
r = client.get("/")
18+
assert r.status_code == 200
19+
data = r.json()
20+
assert data["name"] == "Buddhist University Search API"
21+
assert "endpoints" in data
22+
23+
def test_health_ok(self):
24+
r = client.get("/health")
25+
assert r.status_code == 200
26+
data = r.json()
27+
assert data["status"] == "ok"
28+
assert data["indexed_documents"] >= 4000
29+
30+
31+
@pytest.mark.integration
32+
class TestSearchEndpoint:
33+
def test_basic_search(self):
34+
r = client.get("/search?q=meditation+mindfulness")
35+
assert r.status_code == 200
36+
data = r.json()
37+
assert data["query"] == "meditation mindfulness"
38+
assert data["total"] > 0
39+
assert len(data["results"]) > 0
40+
41+
def test_search_with_limit(self):
42+
r = client.get("/search?q=nibbana&limit=3")
43+
assert r.status_code == 200
44+
data = r.json()
45+
assert len(data["results"]) <= 3
46+
47+
def test_search_limit_max(self):
48+
r = client.get("/search?q=nibbana&limit=25")
49+
assert r.status_code == 422 # limit max=20
50+
51+
def test_search_result_fields(self):
52+
r = client.get("/search?q=impermanence")
53+
data = r.json()
54+
result = data["results"][0]
55+
assert "score" in result
56+
assert "title" in result
57+
assert "category" in result
58+
assert "url" in result
59+
assert "tags" in result
60+
61+
def test_search_filter_category(self):
62+
r = client.get("/search?q=anatta&category=canon")
63+
assert r.status_code == 200
64+
data = r.json()
65+
for result in data["results"]:
66+
assert result["category"] == "canon"
67+
68+
def test_search_filter_multiple_tags(self):
69+
r = client.get("/search?q=compassion&tags=metta&tags=meditation")
70+
assert r.status_code == 200
71+
assert r.json()["total"] >= 0 # peut être 0 si pas de match exact
72+
73+
def test_search_missing_query(self):
74+
r = client.get("/search")
75+
assert r.status_code == 422
76+
77+
def test_search_short_query(self):
78+
r = client.get("/search?q=a")
79+
assert r.status_code == 422 # min_length=2
80+
81+
def test_search_scores_ordered(self):
82+
r = client.get("/search?q=pali+grammar&limit=8")
83+
data = r.json()
84+
scores = [res["score"] for res in data["results"]]
85+
assert scores == sorted(scores, reverse=True)
86+
87+
88+
@pytest.mark.integration
89+
class TestReadingPath:
90+
def test_basic_path(self):
91+
r = client.get("/reading-path?topic=karuna+compassion")
92+
assert r.status_code == 200
93+
data = r.json()
94+
assert len(data) > 0
95+
assert data[0]["path_order"] == 1
96+
97+
def test_path_ordered(self):
98+
r = client.get("/reading-path?topic=meditation&level=beginner&limit=8")
99+
data = r.json()
100+
orders = [item["path_order"] for item in data]
101+
assert orders == list(range(1, len(data) + 1))
102+
103+
def test_path_level_field(self):
104+
for level in ["beginner", "intermediate", "advanced"]:
105+
r = client.get(f"/reading-path?topic=nibbana&level={level}")
106+
assert r.status_code == 200
107+
for item in r.json():
108+
assert item["level"] == level
109+
110+
def test_path_invalid_level(self):
111+
r = client.get("/reading-path?topic=nibbana&level=expert")
112+
assert r.status_code == 422
113+
114+
def test_path_missing_topic(self):
115+
r = client.get("/reading-path")
116+
assert r.status_code == 422
117+
118+
119+
@pytest.mark.integration
120+
class TestCoursesEndpoint:
121+
def test_list_courses(self):
122+
r = client.get("/courses")
123+
assert r.status_code == 200
124+
data = r.json()
125+
assert len(data) >= 10
126+
ids = [c["id"] for c in data]
127+
assert "mn" in ids
128+
assert "pali-primer" in ids
129+
130+
def test_list_courses_fields(self):
131+
r = client.get("/courses")
132+
for course in r.json():
133+
assert "id" in course
134+
assert "title" in course
135+
136+
def test_get_course_mn(self):
137+
r = client.get("/courses/mn")
138+
assert r.status_code == 200
139+
data = r.json()
140+
assert "Majjhima" in data["title"]
141+
assert data["content_count"] > 0
142+
assert len(data["content"]) > 0
143+
144+
def test_get_course_pali(self):
145+
r = client.get("/courses/pali-primer")
146+
assert r.status_code == 200
147+
data = r.json()
148+
assert "Pāl" in data["title"] or "Pali" in data["title"]
149+
150+
def test_get_course_not_found(self):
151+
r = client.get("/courses/nonexistent-xyz")
152+
assert r.status_code == 404
153+
154+
def test_course_content_fields(self):
155+
r = client.get("/courses/mn")
156+
data = r.json()
157+
item = data["content"][0]
158+
assert "title" in item
159+
assert "category" in item
160+
assert "url" in item
161+
162+
163+
@pytest.mark.integration
164+
class TestTeachersEndpoint:
165+
def test_get_teacher_bodhi(self):
166+
r = client.get("/teachers/bodhi")
167+
assert r.status_code == 200
168+
data = r.json()
169+
assert len(data) > 0
170+
171+
def test_get_teacher_not_found(self):
172+
r = client.get("/teachers/nobody-xyz-unknown")
173+
assert r.status_code == 404
174+
175+
def test_teacher_result_fields(self):
176+
r = client.get("/teachers/bodhi?limit=3")
177+
for item in r.json():
178+
assert "title" in item
179+
assert "url" in item
180+
assert "category" in item

0 commit comments

Comments
 (0)