feat: add GPU inference option to web UI by EruditionHerta · Pull Request #34 · OpenMOSS/MOSS-TTS-Nano

EruditionHerta · 2026-04-18T11:02:23Z

Summary

Allow users to select CUDA device for inference in the web interface

Previously the app forced CPU-only mode regardless of --device flag

Now supports --device cuda, --device auto, and per-request device selection via the web UI dropdown

Changes

main(): Removed forced CPU override, added cuda/auto device resolution with CUDA availability detection

RequestRuntimeManager: Extended normalize_requested_execution_device() whitelist to accept cuda and cuda:N, added _build_cuda_runtime_locked() for dynamic GPU runtime creation

Web UI: Added Device selector dropdown (Default/CPU/CUDA:N) in Generation Options

API endpoints: /api/generate and /api/generate-stream/start accept execution_device parameter, /health reports cuda_available

Test Plan

python app.py --device cpu (default behavior unchanged)
python app.py --device cuda (GPU mode)
python app.py --device auto (auto-detect)
Web UI shows Device selector with CUDA options when GPU available
Switching between CPU/CUDA in the web UI works correctly
Streaming generation works with GPU device

Allow users to select CUDA device for inference in the web interface. Previously the app was forced to CPU-only mode. Now supports: - `--device cuda` to start on GPU - `--device auto` to auto-detect (GPU if available, else CPU) - Device selector dropdown in the web UI - Dynamic GPU runtime creation when requested per-request - `/health` endpoint reports `cuda_available`

EruditionHerta mentioned this pull request Apr 18, 2026

WEB UI界面可以加个cpu、gpu选项吗 #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add GPU inference option to web UI#34

feat: add GPU inference option to web UI#34
EruditionHerta wants to merge 1 commit into
OpenMOSS:mainfrom
EruditionHerta:feature/gpu-web-inference

EruditionHerta commented Apr 18, 2026

Labels

1 participant