vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 3.4k
Star 23.6k

Code
Issues 1.2k
Pull requests 337
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Model] Meta Llama 3.1 Know Issues & FAQ

#6689 opened Jul 23, 2024 by simon-mo

Open 57

[Roadmap] vLLM Roadmap Q3 2024

#5805 opened Jun 25, 2024 by simon-mo

Open 21

Virtual Office Hours: July 9 and July 25

#5937 opened Jun 27, 2024 by mgoin

Open 2

Labels 45 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,174 Open 2,440 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[RFC]: Performance Roadmap RFC

#6801 opened Jul 25, 2024 by simon-mo

5 tasks

[Bug]: Discrepancy in vLLM and LoRA Adapter Scores with Different Package Versions bug

Something isn't working

#6800 opened Jul 25, 2024 by pratcooper

[RFC]: Isolate OpenAI Server Into Separate Process RFC

#6797 opened Jul 25, 2024 by robertgshaw2-neuralmagic

[Bug]: Engine iteration timed out. This should never happen! bug

Something isn't working

#6790 opened Jul 25, 2024 by Kelcin2

[Usage]: can I use it with classification model (e.g. GemmaForSequenceClassification) ? usage

How to use vllm

#6789 opened Jul 25, 2024 by dodler

[Feature]: Evaluate multiple ngram speculations in speculative decoding feature request

#6785 opened Jul 25, 2024 by chenglu66

[Bug]: SIGSEGV received at time=1721904360 on cpu 140, Fatal Python error: Segmentation fault bug

Something isn't working

#6783 opened Jul 25, 2024 by eldarkurtic

[Performance]: Slow TTFT(?) for Qwen2-72B-GPTQ-Int4 on H100 *2 performance

Performance-related issues

#6781 opened Jul 25, 2024 by cyc00518

[Bug]: N-gram spec_decode in flash_attention bug bug

Something isn't working

#6780 opened Jul 25, 2024 by chenglu66

[Feature]: support Mistral-Large-Instruct-2407 function calling feature request

#6778 opened Jul 25, 2024 by ybdesire

[Performance]: Medusa SD have poor performance than baseline performance

Performance-related issues

#6777 opened Jul 25, 2024 by cwlseu

[Bug]: qwen2-72b-instruct model with RuntimeError: CUDA error: an illegal memory access was encountered bug

Something isn't working

#6776 opened Jul 25, 2024 by izhuhaoran

[Bug]: --max-model-len configuration robustness bug

Something isn't working

#6774 opened Jul 25, 2024 by gargnipungarg

[Usage]: Pipeline Parallelism but with quantized model? usage

How to use vllm

#6773 opened Jul 25, 2024 by fahadh4ilyas

[Feature]: chat API assistant prefill feature request

#6772 opened Jul 25, 2024 by pseudotensor

[Installation]: Unable to build docker image using Dockerfile.openvino installation

Installation problems

#6769 opened Jul 25, 2024 by zahidulhaque

[Usage]: How to inference a model with medusa speculative sampling. usage

How to use vllm

#6768 opened Jul 25, 2024 by cwlseu

[Bug]: Possible data race when running Llama 405b fp8 bug

Something isn't working

#6767 opened Jul 25, 2024 by tlrmchlsmth

[Bug]: pt_main_thread processes are not killed after main process is killed in MP distributed executor backend bug

Something isn't working

#6766 opened Jul 25, 2024 by oandreeva-nv

[Bug]: FP8 Quantization (static and dynamic) incompatible with --cpu-offload-gb bug

Something isn't working

#6765 opened Jul 25, 2024 by drikster80

[Bug]: premature stopping or cut off output bug

Something isn't working

#6764 opened Jul 25, 2024 by ndao600

[Doc]: ROCm installation instructions do not work documentation

Improvements or additions to documentation

rocm

#6762 opened Jul 24, 2024 by rlrs

[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8 bug

Something isn't working

#6756 opened Jul 24, 2024 by xfalcox

[Usage]: deploy Llama3.1 405B-Instruct-FP8 with H800 * 8 not work usage

How to use vllm

#6750 opened Jul 24, 2024 by gaoxt1983

[Usage]: The 8xH100 device failed to run meta-llama/Meta-Llama-3.1-405B-Instruct-FP8. usage

How to use vllm

#6746 opened Jul 24, 2024 by jueming0312

Previous 1 2 3 4 5 … 46 47 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-06-25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly