SGLang
Overview
SGLang is a structured generation language for LLM inference, built with RadixAttention for efficient prefix caching and continuous batching. Developed by SGLang Team (LMSYS), it accelerates complex inference pipelines.
Key Features
- RadixAttention: KV cache reuse across requests sharing prefixes
- Structured output: Grammar-guided decoding for JSON, code, etc.
- Chain-of-thought: Native support for reasoning traces
- OpenAI-compatible API: Drop-in deployment
Relationship to Other Projects
- Competes directly with vLLM in the LLM serving space
- Used by DeepSeek and Mistral for production serving