-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Discrepancy in vLLM and LoRA Adapter Scores with Different Package Versions
bug
Something isn't working
#6800
opened Jul 25, 2024 by
pratcooper
[RFC]: Isolate OpenAI Server Into Separate Process
RFC
#6797
opened Jul 25, 2024 by
robertgshaw2-neuralmagic
[Bug]: Engine iteration timed out. This should never happen!
bug
Something isn't working
#6790
opened Jul 25, 2024 by
Kelcin2
[Usage]: can I use it with classification model (e.g. GemmaForSequenceClassification) ?
usage
How to use vllm
#6789
opened Jul 25, 2024 by
dodler
[Feature]: Evaluate multiple ngram speculations in speculative decoding
feature request
#6785
opened Jul 25, 2024 by
chenglu66
[Bug]: SIGSEGV received at time=1721904360 on cpu 140, Fatal Python error: Segmentation fault
bug
Something isn't working
#6783
opened Jul 25, 2024 by
eldarkurtic
[Performance]: Slow TTFT(?) for Qwen2-72B-GPTQ-Int4 on H100 *2
performance
Performance-related issues
#6781
opened Jul 25, 2024 by
cyc00518
[Bug]: N-gram spec_decode in flash_attention bug
bug
Something isn't working
#6780
opened Jul 25, 2024 by
chenglu66
[Feature]: support Mistral-Large-Instruct-2407 function calling
feature request
#6778
opened Jul 25, 2024 by
ybdesire
[Performance]: Medusa SD have poor performance than baseline
performance
Performance-related issues
#6777
opened Jul 25, 2024 by
cwlseu
[Bug]: qwen2-72b-instruct model with RuntimeError: CUDA error: an illegal memory access was encountered
bug
Something isn't working
#6776
opened Jul 25, 2024 by
izhuhaoran
[Bug]: --max-model-len configuration robustness
bug
Something isn't working
#6774
opened Jul 25, 2024 by
gargnipungarg
[Usage]: Pipeline Parallelism but with quantized model?
usage
How to use vllm
#6773
opened Jul 25, 2024 by
fahadh4ilyas
[Installation]: Unable to build docker image using Dockerfile.openvino
installation
Installation problems
#6769
opened Jul 25, 2024 by
zahidulhaque
[Usage]: How to inference a model with medusa speculative sampling.
usage
How to use vllm
#6768
opened Jul 25, 2024 by
cwlseu
[Bug]: Possible data race when running Llama 405b fp8
bug
Something isn't working
#6767
opened Jul 25, 2024 by
tlrmchlsmth
[Bug]: Something isn't working
pt_main_thread
processes are not killed after main process is killed in MP distributed executor backend
bug
#6766
opened Jul 25, 2024 by
oandreeva-nv
[Bug]: FP8 Quantization (static and dynamic) incompatible with Something isn't working
--cpu-offload-gb
bug
#6765
opened Jul 25, 2024 by
drikster80
[Bug]: premature stopping or cut off output
bug
Something isn't working
#6764
opened Jul 25, 2024 by
ndao600
[Doc]: ROCm installation instructions do not work
documentation
Improvements or additions to documentation
rocm
#6762
opened Jul 24, 2024 by
rlrs
[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8
bug
Something isn't working
#6756
opened Jul 24, 2024 by
xfalcox
[Usage]: deploy Llama3.1 405B-Instruct-FP8 with H800 * 8 not work
usage
How to use vllm
#6750
opened Jul 24, 2024 by
gaoxt1983
[Usage]: The 8xH100 device failed to run meta-llama/Meta-Llama-3.1-405B-Instruct-FP8.
usage
How to use vllm
#6746
opened Jul 24, 2024 by
jueming0312
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-06-25.