A comprehensive benchmarking framework for testing Open WebUI performance under various load conditions.
This benchmark suite is designed to:
- Measure concurrent user capacity - Test how many users can simultaneously use features like Channels
- Identify performance limits - Find the point where response times degrade
- Compare compute profiles - Test performance across different resource configurations
- Generate actionable reports - Provide detailed metrics and recommendations
- Python 3.11+
- Docker and Docker Compose
- A running Open WebUI instance (or use the provided Docker setup)
- Chromium browser (installed automatically via Playwright for UI benchmarks)
cd benchmark
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
# Install Playwright browsers (required for UI benchmarks)
playwright install chromiumCopy the example environment file and configure your admin credentials:
cp .env.example .envEdit .env with your Open WebUI admin credentials:
OPEN_WEBUI_URL=http://localhost:8080
ADMIN_USER_EMAIL=your-admin@example.com
ADMIN_USER_PASSWORD=your-password- Start Open WebUI with benchmark configuration:
cd docker
./run.sh default # Use the default compute profile (2 CPU, 8GB RAM)- Run the benchmark:
# Run the default benchmark (chat-ui with auto-scaling), which automatically finds max sustainable users based on P95 response time
owb run
# Set a custom response time threshold (default: 1000ms)
owb run --response-threshold 2000
# Run with a fixed number of users (disables auto-scaling)
owb run -m 50
# Run with visible browsers for debugging
owb run --headed
owb run --headed --slow-mo 500 # Slow down for visual inspection- List available benchmarks:
owb list- Run other benchmarks:
# API-based chat benchmark (no browser)
owb run chat-api -m 50
# Channel API concurrency
owb run channels-api -m 50
# Channel WebSocket benchmark
owb run channels-ws -m 50
# Run all benchmarks
owb run all- View results:
Results are organized by benchmark name and timestamp:
results/
└── chat_ui_concurrency/
└── 20260126_014205/
├── result.json # Detailed benchmark data
├── results.csv # Tabular results
└── summary.txt # Human-readable summary
Compute profiles define the resource constraints for the Open WebUI container:
| Profile | CPUs | Memory | Use Case |
|---|---|---|---|
default |
2 | 8GB | Local MacBook testing |
minimal |
1 | 4GB | Testing lower bounds |
cloud_small |
2 | 4GB | Small cloud VM |
cloud_medium |
4 | 8GB | Medium cloud VM |
cloud_large |
8 | 16GB | Large cloud VM |
List available profiles:
owb profilesTests concurrent AI chat performance via the OpenAI-compatible API:
- Creates test users and makes a model publicly available
- Each user sends chat requests via the
/api/chatendpoint - Measures response times, throughput, and error rates
- Tests the backend's ability to handle concurrent LLM requests
Usage:
owb run chat-api -m 50 --model gpt-4o-miniTests concurrent AI chat performance through actual browser UI using Playwright. This is the default benchmark and runs in auto-scale mode by default.
Auto-scale mode (default):
- Progressively adds users until P95 response time exceeds threshold
- Automatically finds maximum sustainable concurrent users
- Reports performance at each level tested
Fixed mode:
- Test with a specific number of concurrent users
- Enabled by specifying
--max-users/-m
How it works:
- Launches real Chromium browser instances (or contexts)
- Each browser logs in as a different user
- Sends chat messages and waits for streaming responses
- Measures actual user-experienced response times including rendering
- Tests full stack performance: UI, backend, and LLM together
Usage:
# Auto-scale mode (default) - finds max sustainable users
owb run
owb run --response-threshold 2000 # Custom threshold (default: 1000ms)
# Fixed mode - test specific user count
owb run -m 50
owb run -m 50 --model gpt-4o-mini
# Debugging options
owb run --headed # Visible browsers
owb run --headed --slow-mo 500 # Slow down for inspectionConfiguration:
chat_ui:
headless: true # Run browsers in headless mode
slow_mo: 0 # Slow down operations by ms (debugging)
viewport_width: 1280 # Browser viewport width
viewport_height: 720 # Browser viewport height
browser_timeout: 30000 # Default timeout in ms
screenshot_on_error: true # Capture screenshots on failure
use_isolated_browsers: false # Use separate browser instances vs contextsNotes:
- Browser benchmarks require more resources than API benchmarks
- For high concurrency (50+), use headless mode and browser contexts
- Headed mode is useful for debugging UI issues
- The benchmark measures actual streaming response detection
Tests concurrent user capacity in Open WebUI Channels:
- Creates a test channel
- Progressively adds users (10, 20, 30, ... up to max)
- Each user sends messages at a configured rate
- Measures response times and error rates
- Identifies the maximum sustainable user count
Configuration options:
channels:
max_concurrent_users: 100 # Maximum users to test
user_step_size: 10 # Increment users by this amount
sustain_time: 30 # Seconds to run at each level
message_frequency: 0.5 # Messages per second per userTests WebSocket scalability for real-time message delivery in Channels:
- Establishes WebSocket connections for multiple users
- Tests real-time message broadcasting
- Measures message delivery latency
- Identifies WebSocket connection limits
Configuration files are located in config/:
benchmark_config.yaml- Main benchmark settingscompute_profiles.yaml- Resource profiles for Docker containers
All configuration can be set via environment variables (loaded from .env file):
| Variable | Description | Default |
|---|---|---|
OPEN_WEBUI_URL |
Open WebUI URL for benchmarking | http://localhost:8080 |
| Variable | Description | Default |
| ---------- | ------------- | --------- |
OPEN_WEBUI_URL |
Open WebUI URL | http://localhost:8080 |
OLLAMA_BASE_URL |
Ollama API URL | http://host.docker.internal:11434 |
ENABLE_CHANNELS |
Enable Channels feature | true |
ADMIN_USER_EMAIL |
Admin email | - |
ADMIN_USER_PASSWORD |
Admin password | - |
MAX_CONCURRENT_USERS |
Max concurrent users | 50 |
USER_STEP_SIZE |
User increment step | 10 |
SUSTAIN_TIME_SECONDS |
Test duration per level | 30 |
MESSAGE_FREQUENCY |
Messages/sec per user | 0.5 |
OPEN_WEBUI_PORT |
Container port | 8080 |
CPU_LIMIT |
CPU limit | 2.0 |
MEMORY_LIMIT |
Memory limit | 8g |
- Create a new file in
benchmark/scenarios/:
from benchmark.core.base import BaseBenchmark
from benchmark.core.metrics import BenchmarkResult
class MyNewBenchmark(BaseBenchmark):
name = "My New Benchmark"
description = "Tests something new"
version = "1.0.0"
async def setup(self) -> None:
# Set up test environment
pass
async def run(self) -> BenchmarkResult:
# Execute the benchmark
# Use self.metrics to record timings
return self.metrics.get_result(self.name)
async def teardown(self) -> None:
# Clean up
pass-
Register the benchmark in
benchmark/cli.py -
Add configuration options if needed in
config/benchmark_config.yaml
from benchmark.core.metrics import MetricsCollector
metrics = MetricsCollector()
metrics.start()
# Time individual operations
with metrics.time_operation("my_operation"):
await do_something()
# Or record manually
metrics.record_timing(
operation="api_call",
duration_ms=150.5,
success=True,
)
metrics.stop()
result = metrics.get_result("My Benchmark")| Metric | Description | Good Threshold |
|---|---|---|
avg_response_time_ms |
Average response time | < 2000ms |
p95_response_time_ms |
95th percentile response time | < 3000ms |
error_rate_percent |
Percentage of failed requests | < 1% |
requests_per_second |
Throughput | > 10 |
*.json- Detailed results for each benchmark runbenchmark_results_*.csv- Combined results in CSV formatsummary_*.txt- Human-readable summary
The chat-ui benchmark in auto-scale mode reports:
- max_sustainable_users: Maximum users where P95 stays under threshold
- levels_tested: Performance data at each user count level
- % of Threshold: How close P95 is to the configured limit
Example auto-scale result:
Auto-Scale Results
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Users ┃ P95 (ms) ┃ Avg (ms) ┃ % of Threshold ┃ Errors ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 10 │ 731 │ 662 │ 37% │ 0.0% │
│ 30 │ 881 │ 748 │ 44% │ 0.0% │
│ 50 │ 1178 │ 1064 │ 59% │ 0.0% │
│ 70 │ 2133 │ 1854 │ 107% │ 0.8% │
└───────┴──────────┴──────────┴────────────────┴────────┘
P95 Threshold: 2000ms
Maximum Sustainable Users: 50
The channel benchmark reports:
- max_sustainable_users: Maximum users where performance thresholds are met
- results_by_level: Performance at each user count level
- tested_levels: All user counts that were tested
Example result analysis:
Users: 10 | P95: 150ms | Errors: 0% | ✓ PASS
Users: 20 | P95: 280ms | Errors: 0.1% | ✓ PASS
Users: 30 | P95: 520ms | Errors: 0.3% | ✓ PASS
Users: 40 | P95: 1200ms | Errors: 0.8% | ✓ PASS
Users: 50 | P95: 3500ms | Errors: 2.1% | ✗ FAIL
Maximum sustainable users: 40
benchmark/
├── benchmark/
│ ├── core/ # Core framework
│ │ ├── base.py # Base benchmark class
│ │ ├── config.py # Configuration management
│ │ ├── metrics.py # Metrics collection
│ │ └── runner.py # Benchmark orchestration
│ ├── clients/ # API clients
│ │ ├── http_client.py # HTTP/REST client
│ │ ├── websocket_client.py # WebSocket client
│ │ └── browser_client.py # Playwright browser automation
│ ├── scenarios/ # Benchmark implementations
│ │ ├── channels.py # Channel benchmarks
│ │ └── chat_ui.py # Browser-based chat benchmark
│ ├── utils/ # Utilities
│ │ └── docker.py # Docker management
│ └── cli.py # Command-line interface
├── config/ # Configuration files
├── docker/ # Docker Compose for benchmarking
└── results/ # Benchmark output organized by {benchmark}/{timestamp}/
The benchmark suite reuses Open WebUI dependencies where possible:
From Open WebUI:
httpx- HTTP clientaiohttp- Async HTTPpython-socketio- WebSocket clientpydantic- Data validationpandas- Data analysis
Benchmark-specific:
playwright- Browser automation for UI testinglocust- Load testing (optional, for advanced scenarios)rich- Terminal outputdocker- Docker SDKmatplotlib- Plotting results
- Connection refused: Ensure Open WebUI is running and accessible
- Authentication errors: Check admin credentials in config
- Docker resource errors: Ensure Docker has enough resources allocated
- WebSocket timeout: Increase
websocket_timeoutin config - Browser launch failures: Run
playwright install chromiumto install browsers - Login timeout in browser tests: Check that
.envhas correctADMIN_USER_NAME(with quotes if it contains spaces) - High browser concurrency fails: Use
--headlessmode and ensure sufficient system resources
Set logging level to DEBUG:
export BENCHMARK_LOG_LEVEL=DEBUG
owb run channelsWhen adding new benchmarks:
- Follow the
BaseBenchmarkinterface - Add tests for the new benchmark
- Update configuration schema if needed
- Add documentation to this README
MIT License - See LICENSE file