IVOA registry validator: OAI-PMH harvest checks, IVOA four-GET profile, VOResource XSD/XSLT validation.
Web UI colors and typography follow ivoa.net; design tokens live in assets/static/css/ivoa-theme.css.
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"
source .venv/bin/activateProduction (no reload):
benson
# Proxy headers are on by default (--proxy-headers). Disable with: benson --no-proxy-headers
# or: uvicorn benson.app:create_app --factory --host 0.0.0.0 --port 8000 --proxy-headersDevelopment (auto-reload on changes under src/ and assets/):
benson --reload
# or: BENSON_DEV=1 benson
# or: uvicorn benson.app:create_app --factory --host 0.0.0.0 --port 8000 --reload --reload-dir src --reload-dir assets| Method | Path | Description |
|---|---|---|
| GET, POST | /validator, /regvalidate |
Async harvest validation form |
| POST | /validator/jobs |
Start validation job |
| GET | /oai |
IVOA standards OAI-PMH catalog from assets/standards |
| GET | /list-publishers |
Publishers OAI XML catalog |
| GET | /api/v1/registry/publishers |
Publishers registry (JSON) |
| GET, POST | /api/v1/registry-validate/harvest |
Harvest validation API |
| POST | /api/v1/registry-validate/voresource |
Standalone VOResource validation (see below) |
Validate one or more VOResource / Registry Interface XML documents without running a full OAI harvest. Useful for checking a single record extracted from a ListRecords response, or any standalone registry metadata file.
Endpoint: POST /api/v1/registry-validate/voresource
Content type: multipart/form-data
| Field | Required | Description |
|---|---|---|
record |
One of record or recordURL |
File upload; repeat the field for multiple files (up to 10). |
recordURL |
One of record or recordURL |
Space-separated http/https URLs to fetch (file: URLs are rejected). |
format |
No | Response format: html (default), xml, or text. |
show |
No | Severity filter for results (default: fail warn rec). |
Validation always uses the bundled XSD catalog under assets/schemas/ plus checkVOResource.xsl. The harvest validator’s Use built-in XSD schemas flag does not apply here.
Upload a local file:
curl -sS -X POST 'http://localhost:8000/api/v1/registry-validate/voresource' \
-F 'format=xml' \
-F 'show=fail warn rec' \
-F 'record=@assets/standards/voresource.xml;type=text/xml'Fetch by URL:
curl -sS -X POST 'http://localhost:8000/api/v1/registry-validate/voresource' \
-F 'format=xml' \
-F 'recordURL=https://example.org/registry/resource.xml'Multiple files — repeat -F 'record=@…' up to ten times, or combine uploads with recordURL.
Normative behaviour (legacy servlet name /regvalidate/VOResourceValidater): docs/regvalidate-functional-contract.md §4. Sample response fixture and parity replay notes: docs/samples/voresource-validater/.
| Variable | Default | Description |
|---|---|---|
SCHEMA_ROOT |
./assets/schemas |
Bundled XSD schemas |
ASSETS_ROOT |
./assets/validate |
XSLT validation stylesheets |
STANDARDS_DIR |
./assets/standards |
IVOA standards records (indexed at /oai) |
OAI_REPOSITORY_NAME |
IVOA Registry of Registries |
OAI Identify repository name |
OAI_ADMIN_EMAIL |
registry@ivoa.net |
OAI admin contact |
OAI_REGISTRY_IDENTIFIER |
ivo://ivoa.net/rofr |
OAI registry identifier |
OAI_MANAGED_AUTHORITY |
ivoa.net |
OAI managed authority |
OAI_MAX_RECORDS |
100 |
Max records per OAI list response |
TEMPLATES_DIR |
./assets/templates |
Jinja templates |
STATIC_DIR |
./assets/static |
Static assets |
PUBLISHERS_DATA_DIR |
./data/publishers |
Publishers registry data directory |
PUBLISHERS_REGISTRY_FILE |
./data/publishers/publishers.json |
Publishers registry JSON file |
SEARCHABLES_CACHE_DIR |
./data/searchables |
RegTAP CSV cache directory (registries.csv) |
SEARCHABLES_CACHE_MAX_AGE_SEC |
— | Cache TTL; unset means no expiry |
REGISTRATION_MAX_FAILURES |
0 |
Max failures allowed for registration |
REGISTRATION_MAX_WARNINGS |
999999 |
Max warnings allowed for registration |
REGISTRATION_REQUIRE_BUILTIN_SCHEMAS |
on | Require built-in XSD schemas for registration |
BENSON_PROXY_HEADERS |
on | Enable proxy headers (fixes template url_for behind a reverse proxy) |
FORWARDED_ALLOW_IPS |
* |
Trusted proxy IPs (tighten in production) |
COMPLY_PATH |
— | Optional comply path |
LOG_LEVEL |
INFO |
Logging level |
BENSON_PARITY_JSON |
off | Enable JSON parity mode (=1 to enable) |
BENSON_EXPOSE_ERRORS |
off | Expose error details in responses (debug only; =1 to enable) |
Deployers who operate a live Registry of Registries instance should run the publisher liveness check periodically:
benson check-publishersThe command reads PUBLISHERS_REGISTRY_FILE, sends one lightweight OAI-PMH Identify request to each registered publisher endpoint, and writes status metadata back into the publishers JSON file. It does not harvest records; its intent is to catch entries whose services have gone stale, become unreachable, changed identity, or stopped operating.
Typical usage is from cron or another deployment scheduler, using the same environment and volume mount as the running Benson service:
PUBLISHERS_REGISTRY_FILE=/data/publishers/publishers.json benson check-publishersUseful options:
--dry-runreports results without updatingpublishers.json.--jsonprints machine-readable results for logs or monitoring.--timeout SECoverridesPUBLISHERS_CHECK_TIMEOUT_SEC.
Any non-ok publisher causes a non-zero exit code unless --dry-run is used, which makes the command suitable for alerting. The main page displays the last check status and timestamp once the command has populated last_checked_at and check_status.
When the validator form enables Use built-in XSD schemas, phase 2 (IVOA four GETs) and phase 3 (harvested records) use the bundled namespace map in src/benson/xml/catalog.py under SCHEMA_ROOT (default assets/schemas/).
OAI responses are validated in two steps: embedded description / metadata / about payloads are checked against the appropriate IVOA XSDs (Registry Interface records use benson-ivoa-bundle.xsd so xsi:type extensions such as vg:Registry resolve), then the OAI-PMH envelope is checked via benson-oai-bundle.xsd. Imports are resolved locally (no network). This matches the regvalidate functional contract intent; validating against OAI-v2.xsd alone is not sufficient for registry Identify responses that embed ri:Resource metadata.
Developer guide: docs/schemas-and-validation-assets.md — directory layout, bundle composition, namespace table, XSLT assets (assets/validate/), standards catalog (assets/standards/), and how each validation phase uses them.
After installing with .[dev] (see Run above):
pytestOn Debian/Ubuntu, install system libraries for lxml if needed: apt-get install libxml2 libxslt1.1.
Build and run locally:
docker build -t benson:local .
docker run --rm -p 8000:8000 benson:localPull from GitHub Container Registry (after CI has pushed an image):
docker login ghcr.io
docker pull ghcr.io/my-org/benson:latestdocker-compose.yml runs Benson with host directories for registry catalogue data:
| Host path | Container path | Purpose |
|---|---|---|
./data/searchables/ |
/data/searchables |
CSV exports of full searchable registries (RegTAP). Set SEARCHABLES_CACHE_DIR to this path. |
./data/publishers/ |
/data/publishers |
Registered publishing registries (publishers.json). Served as OAI XML at /list-publishers. |
Example layout:
data/
searchables/
registries.csv # RegTAP sync export
publishers/
publishers.json # canonical registry list (OAI XML generated at /list-publishers)
docker compose up --buildThen open http://localhost:8000/ (landing page) or http://localhost:8000/validator.
When the cache directory is empty, searchables are fetched live from SEARCHABLES_REGTAP_SYNC_URL on the first home page load (or via benson sync-searchables), and the CSV response is written to SEARCHABLES_CACHE_DIR/registries.csv (or SEARCHABLES_CACHE_FILE). Subsequent loads use the cache until SEARCHABLES_CACHE_MAX_AGE_SEC expires. An empty publishers.json is created automatically; register registries via the validator after a successful dry-run validation.
To warm the searchables cache without opening the home page:
SEARCHABLES_CACHE_DIR=./data/searchables benson sync-searchablesAfter a successful validation (zero failures, built-in XSD schemas enabled), the validator offers registration with the Registry of Registries. Submit the registry’s IVOA identifier and title; Benson stores the entry in publishers.json and serves it at /list-publishers.
Updates use the same flow: validate the registry’s current OAI endpoint (for example after a host or domain change), then submit the same IVOA identifier and an updated title if needed. Benson detects the existing listing and updates harvest_access_url and title instead of rejecting the submission. The original registered_at timestamp is preserved; updated_at records when the listing last changed.
Requirements for updates:
- Validation must pass under the same policy as new registration (
REGISTRATION_MAX_FAILURES, built-in schemas, and so on). - The live Identify identifier must match the stored IVOA identifier (prevents hijacking another listing).
- The validated endpoint must not already belong to a different registered identifier.
The IVOA identifier itself cannot be changed through this path; a registry with a new identity is a new listing.
benson check-publishers may report URL drift in check_detail when a live harvest URL differs from the stored value. It does not update the catalogue automatically — re-validate and submit an update through the validator instead.
