The unlocking spell on base llms: Rethinking alignment via in-context learning
Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning�…
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning�…
Language modeling is compression
It has long been established that predictive models can be transformed into lossless
compressors and vice versa. Incidentally, in recent years, the machine learning community�…
compressors and vice versa. Incidentally, in recent years, the machine learning community�…
Llmlingua: Compressing prompts for accelerated inference of large language models
Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT)�…
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT)�…
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However�…
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However�…
Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal�…
computational/financial cost, longer latency, and inferior performance. Some studies reveal�…
[PDF][PDF] Efficient large language models: A survey
Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and�…
important tasks such as natural language understanding, language generation, and�…
Leave no context behind: Efficient infinite context transformers with infini-attention
T Munkhdalai, M Faruqui, S Gopal�- arXiv preprint arXiv:2404.07143, 2024 - arxiv.org
This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key�…
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key�…
Edgemoe: Fast on-device inference of moe-based large language models
Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a revolution in
machine intelligence, owing to their exceptional capabilities in a wide range of machine�…
machine intelligence, owing to their exceptional capabilities in a wide range of machine�…
Ldb: A large language model debugger via verifying runtime execution step-by-step
Large language models (LLMs) are leading significant progress in code generation. Beyond
one-pass code generation, recent works further integrate unit tests and program verifiers into�…
one-pass code generation, recent works further integrate unit tests and program verifiers into�…
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory�…
performance across various tasks. However, the substantial computational and memory�…