vLLM

Pinned Loading

vllm vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23.6k 3.4k

Repositories

vllm Public
A high-throughput and memory-efficient inference and serving engine for LLMs

vllm-project/vllm’s past year of commit activity

Python 23,642 Apache-2.0 3,386 1,176 (9 issues need help) 335 Updated Jul 26, 2024
vllm-project.github.io Public

vllm-project/vllm-project.github.io’s past year of commit activity

HTML 4 MIT 5 0 0 Updated Jul 26, 2024
llm-compressor Public
HF-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

vllm-project/llm-compressor’s past year of commit activity

Python 84 Apache-2.0 5 5 6 Updated Jul 25, 2024
flash-attention Public Forked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention

vllm-project/flash-attention’s past year of commit activity

Python 8 BSD-3-Clause 1,130 0 0 Updated Jul 24, 2024
buildkite-ci Public

vllm-project/buildkite-ci’s past year of commit activity

HCL 5 11 0 2 Updated Jul 19, 2024
vllm-nccl Public archive
Manages vllm-nccl dependency

vllm-project/vllm-nccl’s past year of commit activity

Python 16 Apache-2.0 2 2 0 Updated Jun 4, 2024
dashboard Public
vLLM performance dashboard

vllm-project/dashboard’s past year of commit activity

Python 13 Apache-2.0 3 0 0 Updated Apr 26, 2024

People

Sponsors

Top languages

Python HTML HCL

Most used topics

Loading…