Google Scholar

BY Lin, A Ravichander, X Lu, N Dziri…�- The Twelfth�…, 2023 - openreview.net

Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning�…

Save Cite Cited by 59 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Language modeling is compression

G Del�tang, A Ruoss, PA Duquenne, E Catt…�- arXiv preprint arXiv�…, 2023 - arxiv.org

It has long been established that predictive models can be transformed into lossless
compressors and vice versa. Incidentally, in recent years, the machine learning community�…

Save Cite Cited by 71 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Llmlingua: Compressing prompts for accelerated inference of large language models

H Jiang, Q Wu, CY Lin, Y Yang, L Qiu�- arXiv preprint arXiv:2310.05736, 2023 - arxiv.org

Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT)�…

Save Cite Cited by 72 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin…�- arXiv preprint arXiv�…, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However�…

Save Cite Cited by 34 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang…�- arXiv preprint arXiv�…, 2023 - arxiv.org

In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal�…

Save Cite Cited by 52 Related articles All 4 versions View as HTML

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng…�- arXiv preprint arXiv�…, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and�…

Save Cite Cited by 49 Related articles All 7 versions View as HTML

[PDF] arxiv.org

Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui, S Gopal�- arXiv preprint arXiv:2404.07143, 2024 - arxiv.org

This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key�…

Save Cite Cited by 22 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Edgemoe: Fast on-device inference of moe-based large language models

R Yi, L Guo, S Wei, A Zhou, S Wang, M Xu�- arXiv preprint arXiv�…, 2023 - arxiv.org

Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a revolution in
machine intelligence, owing to their exceptional capabilities in a wide range of machine�…

Save Cite Cited by 22 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Ldb: A large language model debugger via verifying runtime execution step-by-step

L Zhong, Z Wang, J Shang�- arXiv preprint arXiv:2402.16906, 2024 - arxiv.org

Large language models (LLMs) are leading significant progress in code generation. Beyond
one-pass code generation, recent works further integrate unit tests and program verifiers into�…

Save Cite Cited by 10 Related articles All 2 versions View as HTML

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou…�- arXiv preprint arXiv�…, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory�…

Save Cite Cited by 11 Related articles All 5 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

In-context autoencoder for context compression in a large language model

The unlocking spell on base llms: Rethinking alignment via in-context learning

Language modeling is compression

Llmlingua: Compressing prompts for accelerated inference of large language models

Towards efficient generative large language model serving: A survey from algorithms to systems

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

[PDF][PDF] Efficient large language models: A survey

Leave no context behind: Efficient infinite context transformers with infini-attention

Edgemoe: Fast on-device inference of moe-based large language models

Ldb: A large language model debugger via verifying runtime execution step-by-step

A survey on efficient inference for large language models