The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri…�- The Twelfth�…, 2023 - openreview.net
Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning�…

Language modeling is compression

G Del�tang, A Ruoss, PA Duquenne, E Catt…�- arXiv preprint arXiv�…, 2023 - arxiv.org
It has long been established that predictive models can be transformed into lossless
compressors and vice versa. Incidentally, in recent years, the machine learning community�…

Llmlingua: Compressing prompts for accelerated inference of large language models

H Jiang, Q Wu, CY Lin, Y Yang, L Qiu�- arXiv preprint arXiv:2310.05736, 2023 - arxiv.org
Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT)�…

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin…�- arXiv preprint arXiv�…, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However�…

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang…�- arXiv preprint arXiv�…, 2023 - arxiv.org
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal�…

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng…�- arXiv preprint arXiv�…, 2023 - researchgate.net
Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and�…

Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui, S Gopal�- arXiv preprint arXiv:2404.07143, 2024 - arxiv.org
This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key�…

Edgemoe: Fast on-device inference of moe-based large language models

R Yi, L Guo, S Wei, A Zhou, S Wang, M Xu�- arXiv preprint arXiv�…, 2023 - arxiv.org
Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a revolution in
machine intelligence, owing to their exceptional capabilities in a wide range of machine�…

Ldb: A large language model debugger via verifying runtime execution step-by-step

L Zhong, Z Wang, J Shang�- arXiv preprint arXiv:2402.16906, 2024 - arxiv.org
Large language models (LLMs) are leading significant progress in code generation. Beyond
one-pass code generation, recent works further integrate unit tests and program verifiers into�…

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou…�- arXiv preprint arXiv�…, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory�…