aillminferenceopen-source type: entity 创建: 2026-04-27 更新: 2026-04-27

SGLang

llm-inference | vllm

Overview

SGLang is a structured generation language for LLM inference, built with RadixAttention for efficient prefix caching and continuous batching. Developed by SGLang Team (LMSYS), it accelerates complex inference pipelines.

Key Features

  • RadixAttention: KV cache reuse across requests sharing prefixes
  • Structured output: Grammar-guided decoding for JSON, code, etc.
  • Chain-of-thought: Native support for reasoning traces
  • OpenAI-compatible API: Drop-in deployment

Relationship to Other Projects

  • Competes directly with vLLM in the LLM serving space
  • Used by DeepSeek and Mistral for production serving

References