Api/generate endpoint #1920

gl2007 · 2025-02-05T05:57:41Z

gl2007
Feb 5, 2025

For some reasons I can't explain Ollama's performance on my HP workstation sucks as compared to this for the same model. However, I want to use structured output which Ollama seems to support but not sure if this repo supports it?

If anyone knows and has been able to get it working, please give some details. Appreciate any help!

onestardao · 2025-07-29T08:55:26Z

onestardao
Jul 29, 2025

You’re right — structured outputs are supported in Ollama, but whether this repo replicates that depends on how the inference server handles formatting/parsing under the hood.

For structured output (like JSON), one thing I’ve done in similar setups is manually enforce formatting in the prompt — something like:
Respond only in valid JSON. Use keys: message, status, tokens.
But if you're relying on automatic formatting (like with function_calling or json_mode), this repo might not mirror Ollama's behavior exactly, since it's closer to raw inference bindings.

As for Ollama vs this on the same model — hardware aside, the runtime stack (backends, quantization, server flags) can affect perf a lot more than it looks on paper. Feel free to drop more details if you want someone to cross-check!

— passing by on GitHub patrol

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Api/generate endpoint #1920

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Api/generate endpoint #1920

Uh oh!

gl2007 Feb 5, 2025

Replies: 1 comment

Uh oh!

onestardao Jul 29, 2025

gl2007
Feb 5, 2025

onestardao
Jul 29, 2025