Apache TVM
ml-inference-compilation | llm-inference
Overview
Apache TVM is a compiler stack for machine learning that optimizes models for diverse hardware targets. It decouples model architecture from hardware-specific code generation.
Key Features
- Relay IR: High-level intermediate representation for DNN models
- Tensor operator inventory: Pre-defined optimized kernels (TOPI)
- AutoTVM / Ansor: Cost model-based schedule search
- MLC LLM integration: TVM powers mlc-llm for on-device LLM deployment
Relationship to Other Projects
- Foundation for mlc-llm which uses TVM's relaxation backend
- Competes with llama.cpp for on-device inference but takes a compilation approach
References
- GitHub: https://github.com/apache/tvm
- Website: https://tvm.apache.org/