SGLang

SGLang
SGLang
Developer	LMSYS
Initial release	January 17, 2024; 2 years ago
Written in	Python, Rust, CUDA, C++
Type	Large language model inference engine
License	Apache License 2.0
Website	sglang.io
Repository	github.com/sgl-project/sglang

SGLang (short for Structured Generation Language) is an open-source framework for programming and serving large language models and multimodal models. It was introduced by researchers affiliated with LMSYS^[1] and other institutions as a system combining a Python-embedded language for structured generation with a runtime for high-throughput inference.^[2]^[3]^[4]

The project is designed for low latency and high-throughput inference workloads, and its documentation describes support for features such as structured outputs, speculative decoding, continuous batching, quantization, and compatibility with OpenAI-style APIs.^[5]

History

SGLang was publicly introduced in January 2024 by researchers affiliated with Stanford, UC Berkeley, Texas A&M, and Shanghai Jiao Tong University.^[2] Its academic description later appeared in the proceedings of NeurIPS 2024.^[3] In January 2026, TechCrunch reported that contributors associated with the project had formed the startup RadixArk to commercialize services around SGLang while continuing its open-source development.^[6]^[7]

Architecture

According to the NeurIPS paper, SGLang consists of two main components: a front-end language embedded in Python and a back-end runtime for executing language model programs efficiently.^[3] The front end provides primitives for generation, selection, and parallel control flow, while the runtime uses a set of optimizations intended to reduce repeated computation and improve throughput.^[3]

Among the techniques described by the project are RadixAttention for reusing key–value cache state across multiple generation calls, compressed finite-state machines for faster constrained decoding, and speculative execution for API-based models.^[3] The current documentation also describes support for serving both language models and multimodal models across a range of hardware back ends.^[5]

References

^ "LMSYS". GitHub. GitHub, Inc. Retrieved April 22, 2026.
^ ^a ^b "Fast and Expressive LLM Inference with RadixAttention and SGLang". LMSYS Org. January 17, 2024. Retrieved April 19, 2026.
^ ^a ^b ^c ^d ^e Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E.; Barrett, Clark; Sheng, Ying (2024). SGLang: Efficient Execution of Structured Language Model Programs (PDF). Advances in Neural Information Processing Systems 37. Retrieved April 19, 2026.
^ "SGLang". UC Berkeley Sky Computing Lab. April 25, 2024. Retrieved April 22, 2026.
^ ^a ^b "SGLang Documentation". SGLang. Retrieved April 19, 2026.
^ Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". TechCrunch. Retrieved April 19, 2026.
^ R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". TFN. Retrieved April 22, 2026.

External links

[1] "LMSYS". GitHub. GitHub, Inc. Retrieved April 22, 2026.

[lmsys-launch-2] "Fast and Expressive LLM Inference with RadixAttention and SGLang". LMSYS Org. January 17, 2024. Retrieved April 19, 2026.

[neurips-3] Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E.; Barrett, Clark; Sheng, Ying (2024). SGLang: Efficient Execution of Structured Language Model Programs (PDF). Advances in Neural Information Processing Systems 37. Retrieved April 19, 2026.

[4] "SGLang". UC Berkeley Sky Computing Lab. April 25, 2024. Retrieved April 22, 2026.

[docs-5] "SGLang Documentation". SGLang. Retrieved April 19, 2026.

[techcrunch-6] Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". TechCrunch. Retrieved April 19, 2026.

[7] R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". TFN. Retrieved April 22, 2026.

[1]

History

Architecture

See also

References

External links