Open WebUI Benchmark Suite

A comprehensive benchmarking framework for testing Open WebUI performance under various load conditions.

Overview

This benchmark suite is designed to:

Measure concurrent user capacity - Test how many users can simultaneously use features like Channels
Identify performance limits - Find the point where response times degrade
Compare compute profiles - Test performance across different resource configurations
Generate actionable reports - Provide detailed metrics and recommendations

Quick Start

Prerequisites

Python 3.11+
Docker and Docker Compose
A running Open WebUI instance (or use the provided Docker setup)
Chromium browser (installed automatically via Playwright for UI benchmarks)

Installation

cd benchmark
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

# Install Playwright browsers (required for UI benchmarks)
playwright install chromium

Configuration

Copy the example environment file and configure your admin credentials:

cp .env.example .env

Edit .env with your Open WebUI admin credentials:

OPEN_WEBUI_URL=http://localhost:8080
ADMIN_USER_EMAIL=your-admin@example.com
ADMIN_USER_PASSWORD=your-password

Running Benchmarks

Start Open WebUI with benchmark configuration:

cd docker
./run.sh default  # Use the default compute profile (2 CPU, 8GB RAM)

Run the benchmark:

# Run the default benchmark (chat-ui with auto-scaling), which automatically finds max sustainable users based on P95 response time
owb run

# Set a custom response time threshold (default: 1000ms)
owb run --response-threshold 2000

# Run with a fixed number of users (disables auto-scaling)
owb run -m 50

# Run with visible browsers for debugging
owb run --headed
owb run --headed --slow-mo 500  # Slow down for visual inspection

List available benchmarks:

owb list

Run other benchmarks:

# API-based chat benchmark (no browser)
owb run chat-api -m 50

# Channel API concurrency
owb run channels-api -m 50

# Channel WebSocket benchmark  
owb run channels-ws -m 50

# Run all benchmarks
owb run all

View results:

Results are organized by benchmark name and timestamp:

results/
└── chat_ui_concurrency/
    └── 20260126_014205/
        ├── result.json    # Detailed benchmark data
        ├── results.csv    # Tabular results
        └── summary.txt    # Human-readable summary

Compute Profiles

Compute profiles define the resource constraints for the Open WebUI container:

Profile	CPUs	Memory	Use Case
`default`	2	8GB	Local MacBook testing
`minimal`	1	4GB	Testing lower bounds
`cloud_small`	2	4GB	Small cloud VM
`cloud_medium`	4	8GB	Medium cloud VM
`cloud_large`	8	16GB	Large cloud VM

List available profiles:

owb profiles

Available Benchmarks

Chat API Concurrency (`chat-api`)

Tests concurrent AI chat performance via the OpenAI-compatible API:

Creates test users and makes a model publicly available
Each user sends chat requests via the /api/chat endpoint
Measures response times, throughput, and error rates
Tests the backend's ability to handle concurrent LLM requests

Usage:

owb run chat-api -m 50 --model gpt-4o-mini

Chat UI Concurrency (`chat-ui`) - Default

Tests concurrent AI chat performance through actual browser UI using Playwright. This is the default benchmark and runs in auto-scale mode by default.

Auto-scale mode (default):

Progressively adds users until P95 response time exceeds threshold
Automatically finds maximum sustainable concurrent users
Reports performance at each level tested

Fixed mode:

Test with a specific number of concurrent users
Enabled by specifying --max-users / -m

How it works:

Launches real Chromium browser instances (or contexts)
Each browser logs in as a different user
Sends chat messages and waits for streaming responses
Measures actual user-experienced response times including rendering
Tests full stack performance: UI, backend, and LLM together

Usage:

# Auto-scale mode (default) - finds max sustainable users
owb run
owb run --response-threshold 2000  # Custom threshold (default: 1000ms)

# Fixed mode - test specific user count
owb run -m 50
owb run -m 50 --model gpt-4o-mini

# Debugging options
owb run --headed                    # Visible browsers
owb run --headed --slow-mo 500      # Slow down for inspection

Configuration:

chat_ui:
  headless: true              # Run browsers in headless mode
  slow_mo: 0                  # Slow down operations by ms (debugging)
  viewport_width: 1280        # Browser viewport width
  viewport_height: 720        # Browser viewport height
  browser_timeout: 30000      # Default timeout in ms
  screenshot_on_error: true   # Capture screenshots on failure
  use_isolated_browsers: false # Use separate browser instances vs contexts

Notes:

Browser benchmarks require more resources than API benchmarks
For high concurrency (50+), use headless mode and browser contexts
Headed mode is useful for debugging UI issues
The benchmark measures actual streaming response detection

Channel Concurrency (`channels-api`)

Tests concurrent user capacity in Open WebUI Channels:

Creates a test channel
Progressively adds users (10, 20, 30, ... up to max)
Each user sends messages at a configured rate
Measures response times and error rates
Identifies the maximum sustainable user count

Configuration options:

channels:
  max_concurrent_users: 100  # Maximum users to test
  user_step_size: 10         # Increment users by this amount
  sustain_time: 30           # Seconds to run at each level
  message_frequency: 0.5     # Messages per second per user

Channel WebSocket (`channels-ws`)

Tests WebSocket scalability for real-time message delivery in Channels:

Establishes WebSocket connections for multiple users
Tests real-time message broadcasting
Measures message delivery latency
Identifies WebSocket connection limits

Configuration

Configuration files are located in config/:

benchmark_config.yaml - Main benchmark settings
compute_profiles.yaml - Resource profiles for Docker containers

Environment Variables

All configuration can be set via environment variables (loaded from .env file):

Variable	Description	Default
`OPEN_WEBUI_URL`	Open WebUI URL for benchmarking	`http://localhost:8080`
Variable	Description	Default
----------	-------------	---------
`OPEN_WEBUI_URL`	Open WebUI URL	`http://localhost:8080`
`OLLAMA_BASE_URL`	Ollama API URL	`http://host.docker.internal:11434`
`ENABLE_CHANNELS`	Enable Channels feature	`true`
`ADMIN_USER_EMAIL`	Admin email	-
`ADMIN_USER_PASSWORD`	Admin password	-
`MAX_CONCURRENT_USERS`	Max concurrent users	`50`
`USER_STEP_SIZE`	User increment step	`10`
`SUSTAIN_TIME_SECONDS`	Test duration per level	`30`
`MESSAGE_FREQUENCY`	Messages/sec per user	`0.5`
`OPEN_WEBUI_PORT`	Container port	`8080`
`CPU_LIMIT`	CPU limit	`2.0`
`MEMORY_LIMIT`	Memory limit	`8g`

Create a new file in benchmark/scenarios/:

from benchmark.core.base import BaseBenchmark
from benchmark.core.metrics import BenchmarkResult

class MyNewBenchmark(BaseBenchmark):
    name = "My New Benchmark"
    description = "Tests something new"
    version = "1.0.0"
    
    async def setup(self) -> None:
        # Set up test environment
        pass
    
    async def run(self) -> BenchmarkResult:
        # Execute the benchmark
        # Use self.metrics to record timings
        return self.metrics.get_result(self.name)
    
    async def teardown(self) -> None:
        # Clean up
        pass

Register the benchmark in benchmark/cli.py
Add configuration options if needed in config/benchmark_config.yaml

Custom Metrics Collection

from benchmark.core.metrics import MetricsCollector

metrics = MetricsCollector()
metrics.start()

# Time individual operations
with metrics.time_operation("my_operation"):
    await do_something()

# Or record manually
metrics.record_timing(
    operation="api_call",
    duration_ms=150.5,
    success=True,
)

metrics.stop()
result = metrics.get_result("My Benchmark")

Understanding Results

Key Metrics

Metric	Description	Good Threshold
`avg_response_time_ms`	Average response time	< 2000ms
`p95_response_time_ms`	95th percentile response time	< 3000ms
`error_rate_percent`	Percentage of failed requests	< 1%
`requests_per_second`	Throughput	> 10

Result Files

*.json - Detailed results for each benchmark run
benchmark_results_*.csv - Combined results in CSV format
summary_*.txt - Human-readable summary

Interpreting Chat UI Benchmark Results

The chat-ui benchmark in auto-scale mode reports:

max_sustainable_users: Maximum users where P95 stays under threshold
levels_tested: Performance data at each user count level
% of Threshold: How close P95 is to the configured limit

Example auto-scale result:

                   Auto-Scale Results                    
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Users ┃ P95 (ms) ┃ Avg (ms) ┃ % of Threshold ┃ Errors ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│    10 │      731 │      662 │            37% │   0.0% │
│    30 │      881 │      748 │            44% │   0.0% │
│    50 │     1178 │     1064 │            59% │   0.0% │
│    70 │     2133 │     1854 │           107% │   0.8% │
└───────┴──────────┴──────────┴────────────────┴────────┘

P95 Threshold: 2000ms
Maximum Sustainable Users: 50

Interpreting Channel Benchmark Results

The channel benchmark reports:

max_sustainable_users: Maximum users where performance thresholds are met
results_by_level: Performance at each user count level
tested_levels: All user counts that were tested

Example result analysis:

Users: 10  | P95: 150ms  | Errors: 0%    | ✓ PASS
Users: 20  | P95: 280ms  | Errors: 0.1%  | ✓ PASS
Users: 30  | P95: 520ms  | Errors: 0.3%  | ✓ PASS
Users: 40  | P95: 1200ms | Errors: 0.8%  | ✓ PASS
Users: 50  | P95: 3500ms | Errors: 2.1%  | ✗ FAIL

Maximum sustainable users: 40

Architecture

benchmark/
├── benchmark/
│   ├── core/           # Core framework
│   │   ├── base.py     # Base benchmark class
│   │   ├── config.py   # Configuration management
│   │   ├── metrics.py  # Metrics collection
│   │   └── runner.py   # Benchmark orchestration
│   ├── clients/        # API clients
│   │   ├── http_client.py      # HTTP/REST client
│   │   ├── websocket_client.py # WebSocket client
│   │   └── browser_client.py   # Playwright browser automation
│   ├── scenarios/      # Benchmark implementations
│   │   ├── channels.py # Channel benchmarks
│   │   └── chat_ui.py  # Browser-based chat benchmark
│   ├── utils/          # Utilities
│   │   └── docker.py   # Docker management
│   └── cli.py          # Command-line interface
├── config/             # Configuration files
├── docker/             # Docker Compose for benchmarking
└── results/            # Benchmark output organized by {benchmark}/{timestamp}/

Dependencies

The benchmark suite reuses Open WebUI dependencies where possible:

From Open WebUI:

httpx - HTTP client
aiohttp - Async HTTP
python-socketio - WebSocket client
pydantic - Data validation
pandas - Data analysis

Benchmark-specific:

playwright - Browser automation for UI testing
locust - Load testing (optional, for advanced scenarios)
rich - Terminal output
docker - Docker SDK
matplotlib - Plotting results

Troubleshooting

Common Issues

Connection refused: Ensure Open WebUI is running and accessible
Authentication errors: Check admin credentials in config
Docker resource errors: Ensure Docker has enough resources allocated
WebSocket timeout: Increase websocket_timeout in config
Browser launch failures: Run playwright install chromium to install browsers
Login timeout in browser tests: Check that .env has correct ADMIN_USER_NAME (with quotes if it contains spaces)
High browser concurrency fails: Use --headless mode and ensure sufficient system resources

Debug Mode

Set logging level to DEBUG:

export BENCHMARK_LOG_LEVEL=DEBUG
owb run channels

Contributing

When adding new benchmarks:

Follow the BaseBenchmark interface
Add tests for the new benchmark
Update configuration schema if needed
Add documentation to this README

License

MIT License - See LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
benchmark		benchmark
config		config
docker		docker
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open WebUI Benchmark Suite

Overview

Quick Start

Prerequisites

Installation

Configuration

Running Benchmarks

Compute Profiles

Available Benchmarks

Chat API Concurrency (`chat-api`)

Chat UI Concurrency (`chat-ui`) - Default

Channel Concurrency (`channels-api`)

Channel WebSocket (`channels-ws`)

Configuration

Environment Variables

Custom Metrics Collection

Understanding Results

Key Metrics

Result Files

Interpreting Chat UI Benchmark Results

Interpreting Channel Benchmark Results

Architecture

Dependencies

Troubleshooting

Common Issues

Debug Mode

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

open-webui/benchmark

Folders and files

Latest commit

History

Repository files navigation

Open WebUI Benchmark Suite

Overview

Quick Start

Prerequisites

Installation

Configuration

Running Benchmarks

Compute Profiles

Available Benchmarks

Chat API Concurrency (chat-api)

Chat UI Concurrency (chat-ui) - Default

Channel Concurrency (channels-api)

Channel WebSocket (channels-ws)

Configuration

Environment Variables

Custom Metrics Collection

Understanding Results

Key Metrics

Result Files

Interpreting Chat UI Benchmark Results

Interpreting Channel Benchmark Results

Architecture

Dependencies

Troubleshooting

Common Issues

Debug Mode

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Chat API Concurrency (`chat-api`)

Chat UI Concurrency (`chat-ui`) - Default

Channel Concurrency (`channels-api`)

Channel WebSocket (`channels-ws`)

Packages