SlideShare a Scribd company logo
DEFINITIONS
Generative AI AI systems that can produce
realistic content (text, image, etc.)
Large Language Models (LLMs)
Large neural networks trained at internet scale
to estimate the probability of sequences
of words
Ex: GPT, FLAN-T5, LLaMA, PaLM, BLOOM
(transformers with billions of parameters)
Abilities (and computing resources needed)
tend to rise with the number of parameters
USE CASES
– Standard NLP tasks
(classification, summarization, etc.)
– Content generation
– Reasoning (Q&A, planning, coding, etc.)
In-context learning Specifying the task
to perform directly in the prompt
Introduction to LLMs
TRANSFORMERS
– Can scale efficiently to use multi-core GPUs
– Can process input data in parallel
– Pay attention to all other words
when processing a word
Transformers’ strength lies in understanding
the context and relevance of all words
in a sentence
Token Word or sub-word
The basic unit processed by transformers
Encoder Processes input sequence
to generate a vector representation (or
embedding) for each token
Decoder Processes input tokens to produce
new tokens
Embedding layer Maps each token
to a trainable vector
Positional encoding vector
Added to the token embedding vector
to keep track of the token’s position
Self-Attention Computes the importance
of each word in the input sequence to all
other words in the sequence
TYPES OF LLMS
Encoder only = Autoencoding model
Ex: BERT, RoBERTa
These are not generative models.

 

PRE-TRAINING OBJECTIVE To predict tokens masked
in a sentence (= Masked Language Modeling)
OUTPUT Encoded representation of the text
USE CASE(S) Sentence classification (e.g., NER)
Decoder only = Autoregressive model
Ex: GPT, BLOOM

 
PRE-TRAINING OBJECTIVE To predict the next token
based on the previous sequence of tokens
(= Causal Language Modeling)
OUTPUT Next token
USE CASES Text generation
Encoder-Decoder = Seq-to-seq model
Ex: T5, BART
 
 

        
 

PRE-TRAINING OBJECTIVE Vary from model to model
(e.g., Span corruption like T5)
OUTPUT Sentinel token + predicted tokens
USE CASES Translation, QA, summarization
CONFIGURATION SETTINGS
Parameters to set at inference time
Max new tokens Maximum number of tokens
generated during completion
Decoding strategy
1 GreedyDecoding The word/token with the
highest probability is selected from the final
probability distribution (prone to repetition)









 

2 Random Sampling The model chooses
an output word at random using the probability
distribution to weigh the selection (could be
too creative)
TECHNIQUES TO CONTROL RANDOM SAMPLING
– Top K The next token is drawn from
the k tokens with the highest probabilities
– Top P The next token is drawn from
the  tokens with the highest probabilities,
whose combined probabilities exceed p








 



Temperature Influence the shape of
the probability distribution through a scaling
factor in the softmax layer




 






 
      
  
   
 
  

 
 
















 

 
 

 
 

   
     

 
 

  
 

  


 
 

  


  

© 2024 Dataiku
LLM Instruction
Fine-Tuning  Evaluation
TASKSPECIFIC FINETUNING MULTI-TASK FINE-TUNING MODEL EVALUATION
INSTRUCTION FINETUNING
In-Context Learning Limitations:
Instruction Fine-Tuning
• May be insufficient for very specific tasks.
• Examples take up space in the context window.
• May be insufficient for very specific tasks.
• Examples take up space in the context window.
• May be insufficient for very specific tasks.
• Examples take up space in the context window.
Solutions:
• It might not be an issue if only a single task matters.
• Fine-tune for multiple tasks concurrently
(~50K to 100K examples needed).
• Opt for Parameter Efficient Fine-Tuning (PEFT) instead
of full fine-tuning, which involves training only a small
number of task-specific adapter layers and parameters.
• The LLM generates better completions for a specific task
• Has potentially high computing requirements
The LLM is trained to estimate the next token probability
on a cautiously curated dataset of high-quality examples
for specific tasks.
(e.g., various tasks, non-deterministic outputs, equally
valid answers with different wordings).
To measure and compare LLMs more holistically, use
evaluation benchmark datasets specific to model skills.
Task-specific fine-tuning involves training a pre-trained
model on a particular task or domain using a dataset
tailored for that purpose.
Fine-tuning can significantly increase the performance
of a model on a specific task, but can reduce the
performance on other tasks (“catastrophic forgetting”).
Drawback: It requires a lot of data
(around 50K to 100K examples).
Model variants differ based on the datasets and tasks
used during fine-tuning.
ROUGE  BLEU SCORE
Steps:
Multi-task fine-tuning diversifies training with examples
for multiple tasks, guiding the model to perform
various tasks.
Various approaches exist, but there are a few examples:
1. Prepare the training data.
2. Pass examples of training data to the LLM
(prompt and ground-truth answer).
3. Compute the cross-entropy loss for each completion
token and backpropagate.
Task-specific examples
Prompt-completion pairs Adjusted LLM weights
Pre-trained
LLM
Fine-tuned
LLM
Prompt, completion
Prompt, completion
Prompt, completion
Task-specific dataset
e.g., translation
Often, good results can be achieved with just a
few hundred or thousand examples.
Pre-trained
LLM
InstructLLM
Translate the text:
Source text (English)
Source completion (French)
Multi-task training dataset
Many examples of each task needed for training
Pre-trained
LLM
InstructLLM
Analyze the sentiment
Identify entities
Summarize the text
Translate the text:
Source text (English)
Source completion
(French)
Trainingdata
Prompt LLMcompletion
Loss
Groundtruth
Label this review:
Amazing product!
Sentiment:
Label this review:
Amazing product!
Sentiment: Neutral
Label this review:
Amazing product!
Sentiment: Positive
Pre-trained
LLM
Example of the FLAN family of models
FLAN, or Fine-tuned LAnguage Net, provides
tailored instructions for refining various
models, akin to dessert after pre-training.
FLAN-T5 is an instruct fine-tuned version of the
T5 foundation model, serving as a versatile model
for various tasks.
FLAN-T5 has been fine-tuned on a total of 473
datasets across 146 task categories. For instance,
the SAMSum dataset was used for summarization.
A specialized variant of this model for chat
summarization or for custom company usage
could be developed through additional fine-tuning
on specialized datasets (e.g., DialogSum or custom
internal data).
Evaluating LLMs Is Challenging
Need for automated and organized performance
assessments
• Purpose: To evaluate LLMs on narrow tasks
(summarization, translation) when a reference
is available
• Based on n-grams and rely on precision and
recall scores (multiple variants)
BERT SCORE
• Purpose: To evaluate LLMs in a task-agnostic
manner when a reference is available.
• Based on token-wise comparison, a similarity score
is computed between candidate and reference
sentences.
LLM-as-a-Judge
E.g., GLUE, SuperGLUE, MMLU, Big Bench, Helm
• Purpose: To evaluate LLMs in a task-agnostic
manner when a reference is available.
• Based on prompting an LLM to assess the equivalence
of a generated answer with a ground-truth answer.
Parameter Efficient Fine-Tuning
(PEFT) Methods
LoRA SOFT PROMPTS
PEFT
PEFT methods only update a small number of model parameters.
Examples of PEFT techniques:
• Freeze most model weights, and fine tune only specific layer parameters.
• Keep existing parameters untouched; add only a few new ones or layers
for fine-tuning.
Trade-Off: A smaller rank reduces parameters and accelerates training
but risks lower adaptation quality due to reduced task-specific
information capture.
In literature, it appears that a rank between 4-32 is a good trade-off.
LoRA can be combined with quantization (=QLoRA).
Method to reduce the number of trainable parameters during fine-tuning
by freezing all original model parameters and injecting a pair of rank
decomposition matrices alongside the original weights
Prompt tuning: Add trainable tensors to the model input embeddings,
commonly known as “soft prompts,” optimized directly through
gradient descent.
• Decrease memory usage, often requiring just 1 GPU.
• Mitigate risk of catastrophic forgetting.
• Limit storage to only the new PEFT weights.
Multiple methods exist with trade-offs on parameters or memory efficiency,
training speed, model quality, and inference costs.
Three PEFT methods classes from literature:
Fullfine-tuningofLLMsischallenging:
Mainbenefits:
• No impact on inference latency.
• Fine-tuning specifically on the self-attention layers using LoRA is often
enough to enhance performance for a given task.
• Weights can be switched out as needed, allowing for training on many
different tasks.
Additionalnotes:
• Equal in length to the embedding vectors of the input language tokens
• Can be seen as virtual tokens which can take any value within the
multidimensional embedding space
In prompt tuning, LLM weights are frozen:
• Over time, the embedding vector of the soft prompt is adjusted to optimize
model’s completion of the prompt
• Only few parameters are updated
• A different set of soft prompts can be trained for each task and easily swapped
out during inference (occupying very little space on disk).
From literature, it is shown that at 10B parameters, prompt tuning is as efficient
as full fine-tuning.
Softpromptvectors:
RankChoiceforLoRAMatrices:
The trained parameters can account for only 15%-20% of the
original LLM weights.
Interpreting virtual tokens can pose challenges
(nearest neighbor tokens to the soft prompt location can be used).
!
LoRA
Pre-trained
weights W0
+
h = W0.x + AB.x
B
A
rank r
Outputs h
Inputs x
Gradients
Activations
Optimizer states
Temporary variables
Trainable weights
Requiresalot
ofmemory
Fine-tune only
specific parts of
the original LLM.
Use low-rank representations
to reduce the number of
trainable parameters.
E.g., LoRA
Reparameterization Additive
Selective
Augment the pre-trained
model with new parameters
or layers, training only
the additions.
Adapter
Softprompts
1 - Keep the majority of the original
LLM weights frozen.
2 - Introduce a pair of rank
decomposition matrices.
3 - Train the new matrices A and B.
Model weights update:
1 - Matrix multiplication:
2 - Add to original weights :
B * A = BxA
+ BxA
Unlike prompt engineering, whose limits are:
• The manual effort requirements
• The length of the context window
Pre-trained LLM
Tunable soft prompt Input text
(Typically, 20-100 tokens)
LLM Compute Challenges
and Scaling Laws
COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS
LARGE LANGUAGE MODEL CHOICE
Generative AI Project Lifecycle
Memory Challenge
Two options for model selection
Model pre-training:
Use case
definition
 scoping
Model
Selection
Adapt
(prompt
engineering,
fine tuning),
augment,
and evaluate
model
App integration
(model
optimization,
deployment)
• Use a pre-trained LLM.
• Train your own LLM from scratch.
• Optimizer states (e.g., 2 for Adam)
• Gradients
• Forward activations
• Temporary variables
This could result in an additional 12-20 bytes of
memory needed per model parameter.
• Developed by Google Brain
• Balances memory efficiency and accuracy
• Wider dynamic range
• Optimized for storage and speed in ML tasks
e.g., FLAN T5 pre-trained using BFLOAT16
Model weights are adjusted in order to minimize the
loss of the training objective.
It requires significant computational resources,
(i.e., GPUs, due to high computational load).
Model Cards: List of best use cases, training details,
limitations on models.
LLMs are massive and require plenty of memory
for training and inference.
In most cases, quantization strongly reduces
memory requirements with a limited loss
in prediction.
The model choice will depend on the details
of the task to carry out.
But, in general...
…develop your application using a pre-trained LLM,
except if you work with extremely specific data
(i.e., medical, legal)
Hubs: Where you can browse existing models
To load the model into GPU RAM:
1 parameter (32-bit precision) = 4 bytes needed
1B parameters = 4 x 109 bytes = 4GB of GPU
Pre-training requires storing additional components,
beyond the model’s parameters:
540B
PaLM
GPT-3
175B 100B
YaLM GPT-2
1.5B
BERT
110M
Number
of parameters
RuntimeError : CUDA out of memory
Hence,thememoryneededforLLMtrainingis:
Excessive for consumer hardware
Even demanding for data center hardware
(for single processor training).
For instance, NVIDIA A100 supports up to
80GB of RAM.
Benefitsofquantization:
Less memory
Potentially better model performance
Higher calculation speed
This would mean it requires 16 GB to 24 GB of
GPU memory to train a 1-billion parameter
LLM, around 4-6x the GPU RAM needed just for
storing the model weights.
How can you reduce memory for training?
Quantization: Decrease memory to store the
weights of the model by converting the precision
from 32bit to 16bit or 8bit integers.
How big do the models need to be?
The goal is to maximize model performance.
Researchers explored trade-offs between
the dataset size, the model size, and the
compute budget:
Increasing compute may seem ideal for better
performance, but practical constraints like
hardware, time, and budget limit its feasibility.
What’s the optimal balance?
Once scaling laws have been estimated, we can use the
Chinchilla approach, i.e., we can choose the dataset
size and the model size to train a compute-optimal
model, which maximizes performance for a given
compute budget. The compute-optimal training dataset
size is ~20x the number of parameters.
It has been empirically shown that, as the compute
budget remains fixed:
Quantization maps the FP32 numbers to a lower
precision space by employing scaling factors
determined from the range of the FP32 numbers.
BFLOAT16 is a popular alternative to FP16:
Compute budget
Model
performance
3 x 10-38
3 x 1038
0.0
FP16 | BFLOAT16 | INT8 | INT4
FP32 space
Model size
# of parameters
Scaling choice
Dataset size
# of tokens
Scaling choice
Fixed model size: Increasing training dataset
size improves model performance.
Fixed dataset size: Larger models
demonstrate lower test loss, indicating
enhanced performance.
Constraint

Recommended for you

Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs

The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.

ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP

Presented by indico co-founder Madison May at ODSC East. Abstract: Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning on text, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify benchmarking of transfer learning methods on a variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.

machine learningdeep learningnlp
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides

This document discusses techniques to improve the scalability of transactional memory systems. It analyzes issues that arise with scaling such systems to larger numbers of processors. It proposes using a value predictor to reduce conflicts and aborts in transactional memory, and presents initial results showing performance improvements. Further work is outlined to extend these techniques and better evaluate their effectiveness.

Preference
Fine-Tuning (Part 1)
RLHF PRINCIPLES COLLECTING HUMAN FEEDBACK REWARD MODEL
INTRODUCTION
To ensure alignment between LLMs and human values,
emphasis should be placed on qualities like helpfulness,
honesty, and harmlessness (HHH).
The answers have been generated by the model we want
to fine-tune and then assessed by human evaluators or
an LLM.
The action the model will take depends on:
• The prompt text in the context
• The probability distribution across the vocabulary space
The reward model assesses alignment of LLM outputs
with human preferences.
The reward values obtained are then used to update the
LLM weights and train a new human-aligned version,
with the specifics determined by the optimization
algorithm.
Type of ML in which an agent learns to make decisions
towards a specific goal by taking actions in an
environment, aiming to maximize some cumulative
reward.
Action space: All possible actions based on the current
environment state.
Action: Text generation
Action space: Token vocabulary
State: Any text in the current context window
Additional training with preference data can boost
HHH in completions. Detailed instructions improve response quality
and consistency, resulting in labeled completions
that reflect a consensus.
To develop a model or system that accepts a text
sequence and outputs a scalar reward representing
human preference numerically.
Objective:
The reward model, often a language model (e.g., BERT),
is trained using supervised learning on pairwise
comparison data derived from human assessments
of prompts.
Mathematically, it learns to prioritize the
human-preferred completion while minimizing
the log sigmoid of the reward difference.
Reward model training:
Use the reward model as a binary classifier to assign
reward values to prompt-completion pairs.
Reward value equals the logits output by the model.
Usage of the reward model:
Use case
definition
 scoping
Model
selection
Adapt (prompt
engineering, fine
tuning),
augment, and
evaluate model
App integration
(model
optimization,
deployment)
How to create a
bomb?
In order to create a
bomb, you have to…
I'm sorry, but I can't assist
with that. Creating a bomb
is illegal…
• Generating toxic language
• Responding aggressively
• Providing harmful information
Some models exhibit undesirable behavior:
Reminder on Reinforcement Learning
In the context of LLMs...
• Reinforcement Learning With Human Feedback
(RLHF): Preference data is used to train a reward model
that mimic human annotator preferences, which then
scores LLM completions for reinforcement learning
adjustments.
• Preference Optimization (DPO, IPO): Minimize a
training loss directly on preference data.
Two approaches:
Generative AI Project Lifecycle
Preference data
Prompt Answer A Answer B
Agent
RL policy (Model)
Environment
Action
space
Action at
Reward rt
Objective:
Win the game!
rt+1
State st
st+1
Agent
RL policy = LLM
Environment
LLM Context
Token
vocabulary
Action at
Objective:
Generate aligned text
rt+1
Current
context
State st
st+1
Instruct
LLM
Reward rt
Reward
LLM
1. Choose a model and use it to curate a dataset for
human feedback.
2. Collect feedback from human labelers (generally,
thousands of people):
• Specify the model alignment criterion.
• Request that the labelers rank the outputs according
to that criterion.
3. Prepare the data for training
Create pairwise training data from rankings for the
training of the reward model.
Steps
Prompt samples Model completions
LLM
LLM
2 2 2
1 1 3
3 3 1
Alignment criterion:
helpfulness
The coffee
is too bitter
Completion 3
Completion 1
Completion 2
Completions
Completions
Reward
Completions
2
1
3
[0,1]
[1,0]
[1,0]
[1,0]
[1,0]
[1,0]
Reward
Place the preferred
option first by
reordering
completions.
Rank
Assign 1 for the
preferred response and
0 for the rejected one
response in each pair.
RM
(Prompt x,
Completion yj
)
(Prompt x,
Completion yk
)
Reward rj
Reward rk
loss = log( (rj
-r k
)
RM
RM
Samantha enjoys reading books
Positive
Negative
3.17
-2,6
Logits
(Prompt x,
Completion y)
Preference
Fine-Tuning (Part 2)
PPO ALGORITHM FOR LLMS REWARD HACKING RL FROM AI FEEDBACK
FINETUNING WITH RL
 REWARD MODEL
The LLM weights are updated to create a human-aligned
model via reinforcement learning, leveraging the reward
model, and starting with a high-performing base model.
Goal: To align the LLM with provided instructions and
human behavior.
As the process advances successfully, the reward will
gradually increase until it meets the predefined evaluation
criteria for helpfulness.
Updated model: The resulting updated model should
be more aligned with human preferences.
Reinforcement learning algorithm: Proximal policy
optimization (PPO) is a popular choice.
Example:
Prompt: A tree is...
Iteration 1: ...a plant with a trunk. → Reward: 0.3
…
Iteration 4: ...a provider of shade and oxygen. → Reward: 1.6
…
Iteration n: ...a symbol of strength and resilience. → Reward: 2.9
PPO iteratively updates the policy to maximize the reward,
adjusting the LLM weights incrementally to maintain
proximity to the previous version within a defined range
for stable learning.
The PPO objective is used to update the LLM weights
by backpropagation:
The agent learns to cheat the system by maximizing
rewards at the expense of alignment with desired behavior.
Value Loss: Minimize it to improve return
prediction accuracy.
Policy Loss: Maximize it to get higher rewards while
staying within reliable bounds.
Entropy Loss: Maximize it to promote and sustain
model creativity.
The higher the entropy, the more creative the policy.
Obtaining the reward model is labor-intensive;
scaling through AI-supervision is more precise and
requires fewer human labels.
Constitutional AI (Bai, Yuntao, et al., 2022)
Approach that relies on a set of principles governing
AI behavior, along with a small number of examples
for few-shot prompting, collectively forming
the “constitution.”
Example of constitutional principle: “Please choose the
response that is the most helpful, honest, and harmless.”
To prevent reward hacking, penalize RL updates if they
significantly deviate from the frozen original LLM, using
KL divergence.
Updated
LLM
Prompt
N iterations
1
1: Text Generation
2: Scoring
3: Model weights update with
reinforcement learning.
2
3
Answer
Reinforcement learning
Scores
RM:
Reward
Model
Hyperparameters
Policy loss Value loss Entropy loss
Prompt
RL
updated
LLM
“The movie was...” “...an absolute thrill
fest that left me breathless!”
Value
loss
Estimated
future total reward
Value
function
Actual Reward
from the reward model
Model’s probability distribution over tokens
Probabilities of
the next token
with the updated LLM
Probabilities of
the next token
with the initial LLM
Advantage term Define “trust region”
Guardrails
Keeping the policy in the “trust region”
RL
updated
LLM
Original
LLM
RM
Prompt
“The movie was...”
“... enjoyable
and decent”
“... thrilling and
unforgettable...”
PPO
KL divergence
Shift penalty
KL penalty
added in
reward
DIRECT PREFERENCE
OPTIMIZATION
An RLHF pipeline is difficult to implement:
• Need to train a reward model
• New completions needed during training
• Instability of the RL algorithm
Direct Preference Optimization (DPO) is a simpler
and more stable alternative to RLHF. It solves the
same problem by minimizing a training loss directly
based on the preference data (without reward
modeling or RL).
Identity Preference Optimization (IPO) is a variant
of DPO less prone to overfitting.
Comparison
data
DPO (or IPO)
Fine
tuned
LLM
1. Supervised Learning Stage
2. Reinforcement Learning (RL) Stage - RLAIF
Helpful
LLM
Fine-
tuned
LLM
Harmful prompts,
completions
Harmful prompts,
revised completions
Critique and revise
responses based on
constitutional principles
Fine-tune a
pre-trained LLM
1
2
3
Fine-
tuned
LLM
Preference
model
Ask which response
is best based on
constitutional principles
Fine-tune the LLM
using RL against
the preference model
Train a
preference model
Harmful prompts,
pair of completions
AI-generated
comparison data
4
5
6
7
+ human feedback
helpfulness data
Result: A policy trained by Reinforcement
Learning with AI Feedback (RLAIF)
LLM-Powered Applications
LLMINTEGRATED APPLICATIONS LLM REASONING WITH
CHAINOFTHOUGHT PROMPTING
PROGRAMAIDEDLANGUAGEREACT
MODEL OPTIMIZATION
FOR DEPLOYMENT
• Scale down model complexity while preserving accuracy.
• Train a small student model to mimic a large frozen
teacher model.
• Knowledge can be out of date.
• LLMs struggle with certain tasks (e.g., math).
• LLMs can confidently provide wrong answers
(hallucination).
• Soft labels: Teacher completions serve as ground
truth labels.
• Student and distillation losses update student model
weights via backpropagation.
• The student LLM can be used for inference.
Leverage external app or data sources
LLM should serve as a reasoning engine.
The prompt and completion are important!
ModelDistillation
• Prompts the model to break down problems into
sequential steps.
• Operates by integrating intermediate reasoning steps
into examples for one-or few-shot inference.
Chain-of-Thought(CoT)
PTQ reduces model weight precision to 16-bit float or
8-bit integer.
• Can target both weights and activation layers for impact.
• May sacrifice performance, yet beneficial for cost
savings and performance gains.
Complex reasoning is challenging for LLMs.
E.g., problems with multiple steps, mathematical reasoning
In the completion, the whole prompt is included.
• We retrieve documents most similar to the input query
in the external data.
• We combine the documents with the input query and
send the prompt to the LLM to receive the answer.
PostTrainingQuantization(PTQ)
Removes redundant model parameters that contribute
little to the model performance.
Some methods require full model training, while others are
in the PEFT category (LoRA).
Size of the context window can be a limitation.
ModelPruning
RetrievalAugmentedGeneration(RAG)
AI framework that integrates external data sources
and apps (e.g., documents, private databases, etc.).
Multiple implementations exist, will depend on the
details of the task and the data format.
ReAct
Prompting strategy that combines CoT reasoning and
action planning, employing structured examples to
guide an LLM in problem-solving and decision-making
for solutions
Prompt
Q: Roger has 5 tennis balls. He buys 2 more cans of
tennis balls. Each can has 3 tennis balls. How many
tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis
balls each is 6 tennis balls. 5+6=11. The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to
make lunch and bought 6 more, how many apples
do they have?
Completion
A: The cafeteria had 23 apples. They used 20 to
make lunch. 23-20=3. They bought 6 more apples,
so 3+6=9. The answer is 9.
Use multiple chunks (e.g., with LangChain)
Shrink model size, maintain performance
Improves performance but struggles with
precision-demanding tasks like tax computation
or discount application.
Inference challenges: High computing and storage demands
LLM
Student
Knowledge
distillation
Soft
labels
Soft
predictions
Hard
predictions
Hard
labels
LLM
Teacher
Distillation
loss
Student
loss
Labeled
training data
Retriever
User
Query External
knowledge
Query
encoder
LLM Answer
LLM
User
Ext data sources
Ext applications
API
Python
LLM-integrated application
Orchestrator
Frontend E.g.,
!
! Data must be in format that allows its relevance
to be assessed at inference time.
Use embedding vectors (vector store)
Vector database: Stores vectors and associated
metadata, enabling efficient nearest-neighbor
vector search.
Solution: Allow the LLM to communicate with a proficient
math program, as a Python interpreter. ReAct reduces the risks of errors.
LangChain can be used to connect multiple
components through agents, tools, etc.
Agents: Interpret the user input and determine which
tool to use for the task (LangChain includes agents for
PAL  ReAct).
1.Plan actions 2.Format outputs 3.Validate actions
Set of instructions
Step1: Get
customer ID
Step2: Reset
password
Requires formatting
for applications to
understand actions
Collect information
that allows validation
of an action
Program-AidedLanguage(PAL)
Generate scripts and pass it to the interpreter.
Completion is handed off to a Python interpreter.
Calculations are accurate and reliable.
Prompt
Q: Roger has 5 tennis balls. [...]
A:
# Roger started with 5 tennis balls
tennis_balles=5
# 2 cans of tennis balls each is
bought_balls=2*3
# tennis balls. The answer is
answer = tennis_balls + bought_balls
Q. [...]
CoT reasoning
PALexecution
Instructions
Question
Thought
Action
Observation
Question to be answered
Thought: Analysis of the
current situation and the
next steps to take
Instructions: Define the task,
what is a thought and the
actions
Action: The actions are from
a predetermined list and
defined in the set of
instructions in the prompt
The loop ends when the
action is finish []
Observation: Result of the
previous action

More Related Content

Similar to LLM Cheatsheet and it's brief introduction

DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
0567Padma
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Lviv Startup Club
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
indico data
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
smpant
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
taeseon ryu
 
Se 381 - lec 26 - 26 - 12 may30 - software design - detailed design - se de...
Se 381 - lec 26  - 26 - 12 may30 - software design -  detailed design - se de...Se 381 - lec 26  - 26 - 12 may30 - software design -  detailed design - se de...
Se 381 - lec 26 - 26 - 12 may30 - software design - detailed design - se de...
babak danyal
 
Transformers
TransformersTransformers
Transformers
Anup Joseph
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
San Kim
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Nat Rice
 
Matopt
MatoptMatopt
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptx
ssuser6a1dbf
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
DarshanG13
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
Nikolas Markou
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first Startup
Sanjana Chowdhury
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
Lawrence Bernstein
 

Similar to LLM Cheatsheet and it's brief introduction (20)

DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 
Se 381 - lec 26 - 26 - 12 may30 - software design - detailed design - se de...
Se 381 - lec 26  - 26 - 12 may30 - software design -  detailed design - se de...Se 381 - lec 26  - 26 - 12 may30 - software design -  detailed design - se de...
Se 381 - lec 26 - 26 - 12 may30 - software design - detailed design - se de...
 
Transformers
TransformersTransformers
Transformers
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
 
Matopt
MatoptMatopt
Matopt
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptx
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first Startup
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 

Recently uploaded

一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
一比一原版(usyd毕业证书)悉尼大学毕业证��何办理一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
67n7f53
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
Donghwan Lee
 
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile OfferHiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
$A19
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
1258strict
 
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
gragyogita3
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
qemnpg
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
sanjay singh
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
seenu pandey
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
dipti singh$A17
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
KiranKumar139571
 

Recently uploaded (20)

一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
一比一原版(usyd毕业证书)悉尼大学毕业证如何办理一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
一比一原版(usyd毕业证书)悉尼大学毕业证如何办理
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
 
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile OfferHiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
Hiranandani Gardens @Call @Girls Whatsapp 9833363713 With High Profile Offer
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
 
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Saharanpur 0000000000 Priya Sharma Beautiful And Cute Girl any Time
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
 

LLM Cheatsheet and it's brief introduction

  • 1. DEFINITIONS Generative AI AI systems that can produce realistic content (text, image, etc.) Large Language Models (LLMs) Large neural networks trained at internet scale to estimate the probability of sequences of words Ex: GPT, FLAN-T5, LLaMA, PaLM, BLOOM (transformers with billions of parameters) Abilities (and computing resources needed) tend to rise with the number of parameters USE CASES – Standard NLP tasks (classification, summarization, etc.) – Content generation – Reasoning (Q&A, planning, coding, etc.) In-context learning Specifying the task to perform directly in the prompt Introduction to LLMs TRANSFORMERS – Can scale efficiently to use multi-core GPUs – Can process input data in parallel – Pay attention to all other words when processing a word Transformers’ strength lies in understanding the context and relevance of all words in a sentence Token Word or sub-word The basic unit processed by transformers Encoder Processes input sequence to generate a vector representation (or embedding) for each token Decoder Processes input tokens to produce new tokens Embedding layer Maps each token to a trainable vector Positional encoding vector Added to the token embedding vector to keep track of the token’s position Self-Attention Computes the importance of each word in the input sequence to all other words in the sequence TYPES OF LLMS Encoder only = Autoencoding model Ex: BERT, RoBERTa These are not generative models. PRE-TRAINING OBJECTIVE To predict tokens masked in a sentence (= Masked Language Modeling) OUTPUT Encoded representation of the text USE CASE(S) Sentence classification (e.g., NER) Decoder only = Autoregressive model Ex: GPT, BLOOM PRE-TRAINING OBJECTIVE To predict the next token based on the previous sequence of tokens (= Causal Language Modeling) OUTPUT Next token USE CASES Text generation Encoder-Decoder = Seq-to-seq model Ex: T5, BART PRE-TRAINING OBJECTIVE Vary from model to model (e.g., Span corruption like T5) OUTPUT Sentinel token + predicted tokens USE CASES Translation, QA, summarization CONFIGURATION SETTINGS Parameters to set at inference time Max new tokens Maximum number of tokens generated during completion Decoding strategy 1 GreedyDecoding The word/token with the highest probability is selected from the final probability distribution (prone to repetition) 2 Random Sampling The model chooses an output word at random using the probability distribution to weigh the selection (could be too creative) TECHNIQUES TO CONTROL RANDOM SAMPLING – Top K The next token is drawn from the k tokens with the highest probabilities – Top P The next token is drawn from the tokens with the highest probabilities, whose combined probabilities exceed p Temperature Influence the shape of the probability distribution through a scaling factor in the softmax layer              © 2024 Dataiku
  • 2. LLM Instruction Fine-Tuning Evaluation TASKSPECIFIC FINETUNING MULTI-TASK FINE-TUNING MODEL EVALUATION INSTRUCTION FINETUNING In-Context Learning Limitations: Instruction Fine-Tuning • May be insufficient for very specific tasks. • Examples take up space in the context window. • May be insufficient for very specific tasks. • Examples take up space in the context window. • May be insufficient for very specific tasks. • Examples take up space in the context window. Solutions: • It might not be an issue if only a single task matters. • Fine-tune for multiple tasks concurrently (~50K to 100K examples needed). • Opt for Parameter Efficient Fine-Tuning (PEFT) instead of full fine-tuning, which involves training only a small number of task-specific adapter layers and parameters. • The LLM generates better completions for a specific task • Has potentially high computing requirements The LLM is trained to estimate the next token probability on a cautiously curated dataset of high-quality examples for specific tasks. (e.g., various tasks, non-deterministic outputs, equally valid answers with different wordings). To measure and compare LLMs more holistically, use evaluation benchmark datasets specific to model skills. Task-specific fine-tuning involves training a pre-trained model on a particular task or domain using a dataset tailored for that purpose. Fine-tuning can significantly increase the performance of a model on a specific task, but can reduce the performance on other tasks (“catastrophic forgetting”). Drawback: It requires a lot of data (around 50K to 100K examples). Model variants differ based on the datasets and tasks used during fine-tuning. ROUGE BLEU SCORE Steps: Multi-task fine-tuning diversifies training with examples for multiple tasks, guiding the model to perform various tasks. Various approaches exist, but there are a few examples: 1. Prepare the training data. 2. Pass examples of training data to the LLM (prompt and ground-truth answer). 3. Compute the cross-entropy loss for each completion token and backpropagate. Task-specific examples Prompt-completion pairs Adjusted LLM weights Pre-trained LLM Fine-tuned LLM Prompt, completion Prompt, completion Prompt, completion Task-specific dataset e.g., translation Often, good results can be achieved with just a few hundred or thousand examples. Pre-trained LLM InstructLLM Translate the text: Source text (English) Source completion (French) Multi-task training dataset Many examples of each task needed for training Pre-trained LLM InstructLLM Analyze the sentiment Identify entities Summarize the text Translate the text: Source text (English) Source completion (French) Trainingdata Prompt LLMcompletion Loss Groundtruth Label this review: Amazing product! Sentiment: Label this review: Amazing product! Sentiment: Neutral Label this review: Amazing product! Sentiment: Positive Pre-trained LLM Example of the FLAN family of models FLAN, or Fine-tuned LAnguage Net, provides tailored instructions for refining various models, akin to dessert after pre-training. FLAN-T5 is an instruct fine-tuned version of the T5 foundation model, serving as a versatile model for various tasks. FLAN-T5 has been fine-tuned on a total of 473 datasets across 146 task categories. For instance, the SAMSum dataset was used for summarization. A specialized variant of this model for chat summarization or for custom company usage could be developed through additional fine-tuning on specialized datasets (e.g., DialogSum or custom internal data). Evaluating LLMs Is Challenging Need for automated and organized performance assessments • Purpose: To evaluate LLMs on narrow tasks (summarization, translation) when a reference is available • Based on n-grams and rely on precision and recall scores (multiple variants) BERT SCORE • Purpose: To evaluate LLMs in a task-agnostic manner when a reference is available. • Based on token-wise comparison, a similarity score is computed between candidate and reference sentences. LLM-as-a-Judge E.g., GLUE, SuperGLUE, MMLU, Big Bench, Helm • Purpose: To evaluate LLMs in a task-agnostic manner when a reference is available. • Based on prompting an LLM to assess the equivalence of a generated answer with a ground-truth answer.
  • 3. Parameter Efficient Fine-Tuning (PEFT) Methods LoRA SOFT PROMPTS PEFT PEFT methods only update a small number of model parameters. Examples of PEFT techniques: • Freeze most model weights, and fine tune only specific layer parameters. • Keep existing parameters untouched; add only a few new ones or layers for fine-tuning. Trade-Off: A smaller rank reduces parameters and accelerates training but risks lower adaptation quality due to reduced task-specific information capture. In literature, it appears that a rank between 4-32 is a good trade-off. LoRA can be combined with quantization (=QLoRA). Method to reduce the number of trainable parameters during fine-tuning by freezing all original model parameters and injecting a pair of rank decomposition matrices alongside the original weights Prompt tuning: Add trainable tensors to the model input embeddings, commonly known as “soft prompts,” optimized directly through gradient descent. • Decrease memory usage, often requiring just 1 GPU. • Mitigate risk of catastrophic forgetting. • Limit storage to only the new PEFT weights. Multiple methods exist with trade-offs on parameters or memory efficiency, training speed, model quality, and inference costs. Three PEFT methods classes from literature: Fullfine-tuningofLLMsischallenging: Mainbenefits: • No impact on inference latency. • Fine-tuning specifically on the self-attention layers using LoRA is often enough to enhance performance for a given task. • Weights can be switched out as needed, allowing for training on many different tasks. Additionalnotes: • Equal in length to the embedding vectors of the input language tokens • Can be seen as virtual tokens which can take any value within the multidimensional embedding space In prompt tuning, LLM weights are frozen: • Over time, the embedding vector of the soft prompt is adjusted to optimize model’s completion of the prompt • Only few parameters are updated • A different set of soft prompts can be trained for each task and easily swapped out during inference (occupying very little space on disk). From literature, it is shown that at 10B parameters, prompt tuning is as efficient as full fine-tuning. Softpromptvectors: RankChoiceforLoRAMatrices: The trained parameters can account for only 15%-20% of the original LLM weights. Interpreting virtual tokens can pose challenges (nearest neighbor tokens to the soft prompt location can be used). ! LoRA Pre-trained weights W0 + h = W0.x + AB.x B A rank r Outputs h Inputs x Gradients Activations Optimizer states Temporary variables Trainable weights Requiresalot ofmemory Fine-tune only specific parts of the original LLM. Use low-rank representations to reduce the number of trainable parameters. E.g., LoRA Reparameterization Additive Selective Augment the pre-trained model with new parameters or layers, training only the additions. Adapter Softprompts 1 - Keep the majority of the original LLM weights frozen. 2 - Introduce a pair of rank decomposition matrices. 3 - Train the new matrices A and B. Model weights update: 1 - Matrix multiplication: 2 - Add to original weights : B * A = BxA + BxA Unlike prompt engineering, whose limits are: • The manual effort requirements • The length of the context window Pre-trained LLM Tunable soft prompt Input text (Typically, 20-100 tokens)
  • 4. LLM Compute Challenges and Scaling Laws COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS LARGE LANGUAGE MODEL CHOICE Generative AI Project Lifecycle Memory Challenge Two options for model selection Model pre-training: Use case definition scoping Model Selection Adapt (prompt engineering, fine tuning), augment, and evaluate model App integration (model optimization, deployment) • Use a pre-trained LLM. • Train your own LLM from scratch. • Optimizer states (e.g., 2 for Adam) • Gradients • Forward activations • Temporary variables This could result in an additional 12-20 bytes of memory needed per model parameter. • Developed by Google Brain • Balances memory efficiency and accuracy • Wider dynamic range • Optimized for storage and speed in ML tasks e.g., FLAN T5 pre-trained using BFLOAT16 Model weights are adjusted in order to minimize the loss of the training objective. It requires significant computational resources, (i.e., GPUs, due to high computational load). Model Cards: List of best use cases, training details, limitations on models. LLMs are massive and require plenty of memory for training and inference. In most cases, quantization strongly reduces memory requirements with a limited loss in prediction. The model choice will depend on the details of the task to carry out. But, in general... …develop your application using a pre-trained LLM, except if you work with extremely specific data (i.e., medical, legal) Hubs: Where you can browse existing models To load the model into GPU RAM: 1 parameter (32-bit precision) = 4 bytes needed 1B parameters = 4 x 109 bytes = 4GB of GPU Pre-training requires storing additional components, beyond the model’s parameters: 540B PaLM GPT-3 175B 100B YaLM GPT-2 1.5B BERT 110M Number of parameters RuntimeError : CUDA out of memory Hence,thememoryneededforLLMtrainingis: Excessive for consumer hardware Even demanding for data center hardware (for single processor training). For instance, NVIDIA A100 supports up to 80GB of RAM. Benefitsofquantization: Less memory Potentially better model performance Higher calculation speed This would mean it requires 16 GB to 24 GB of GPU memory to train a 1-billion parameter LLM, around 4-6x the GPU RAM needed just for storing the model weights. How can you reduce memory for training? Quantization: Decrease memory to store the weights of the model by converting the precision from 32bit to 16bit or 8bit integers. How big do the models need to be? The goal is to maximize model performance. Researchers explored trade-offs between the dataset size, the model size, and the compute budget: Increasing compute may seem ideal for better performance, but practical constraints like hardware, time, and budget limit its feasibility. What’s the optimal balance? Once scaling laws have been estimated, we can use the Chinchilla approach, i.e., we can choose the dataset size and the model size to train a compute-optimal model, which maximizes performance for a given compute budget. The compute-optimal training dataset size is ~20x the number of parameters. It has been empirically shown that, as the compute budget remains fixed: Quantization maps the FP32 numbers to a lower precision space by employing scaling factors determined from the range of the FP32 numbers. BFLOAT16 is a popular alternative to FP16: Compute budget Model performance 3 x 10-38 3 x 1038 0.0 FP16 | BFLOAT16 | INT8 | INT4 FP32 space Model size # of parameters Scaling choice Dataset size # of tokens Scaling choice Fixed model size: Increasing training dataset size improves model performance. Fixed dataset size: Larger models demonstrate lower test loss, indicating enhanced performance. Constraint
  • 5. Preference Fine-Tuning (Part 1) RLHF PRINCIPLES COLLECTING HUMAN FEEDBACK REWARD MODEL INTRODUCTION To ensure alignment between LLMs and human values, emphasis should be placed on qualities like helpfulness, honesty, and harmlessness (HHH). The answers have been generated by the model we want to fine-tune and then assessed by human evaluators or an LLM. The action the model will take depends on: • The prompt text in the context • The probability distribution across the vocabulary space The reward model assesses alignment of LLM outputs with human preferences. The reward values obtained are then used to update the LLM weights and train a new human-aligned version, with the specifics determined by the optimization algorithm. Type of ML in which an agent learns to make decisions towards a specific goal by taking actions in an environment, aiming to maximize some cumulative reward. Action space: All possible actions based on the current environment state. Action: Text generation Action space: Token vocabulary State: Any text in the current context window Additional training with preference data can boost HHH in completions. Detailed instructions improve response quality and consistency, resulting in labeled completions that reflect a consensus. To develop a model or system that accepts a text sequence and outputs a scalar reward representing human preference numerically. Objective: The reward model, often a language model (e.g., BERT), is trained using supervised learning on pairwise comparison data derived from human assessments of prompts. Mathematically, it learns to prioritize the human-preferred completion while minimizing the log sigmoid of the reward difference. Reward model training: Use the reward model as a binary classifier to assign reward values to prompt-completion pairs. Reward value equals the logits output by the model. Usage of the reward model: Use case definition scoping Model selection Adapt (prompt engineering, fine tuning), augment, and evaluate model App integration (model optimization, deployment) How to create a bomb? In order to create a bomb, you have to… I'm sorry, but I can't assist with that. Creating a bomb is illegal… ��� Generating toxic language • Responding aggressively • Providing harmful information Some models exhibit undesirable behavior: Reminder on Reinforcement Learning In the context of LLMs... • Reinforcement Learning With Human Feedback (RLHF): Preference data is used to train a reward model that mimic human annotator preferences, which then scores LLM completions for reinforcement learning adjustments. • Preference Optimization (DPO, IPO): Minimize a training loss directly on preference data. Two approaches: Generative AI Project Lifecycle Preference data Prompt Answer A Answer B Agent RL policy (Model) Environment Action space Action at Reward rt Objective: Win the game! rt+1 State st st+1 Agent RL policy = LLM Environment LLM Context Token vocabulary Action at Objective: Generate aligned text rt+1 Current context State st st+1 Instruct LLM Reward rt Reward LLM 1. Choose a model and use it to curate a dataset for human feedback. 2. Collect feedback from human labelers (generally, thousands of people): • Specify the model alignment criterion. • Request that the labelers rank the outputs according to that criterion. 3. Prepare the data for training Create pairwise training data from rankings for the training of the reward model. Steps Prompt samples Model completions LLM LLM 2 2 2 1 1 3 3 3 1 Alignment criterion: helpfulness The coffee is too bitter Completion 3 Completion 1 Completion 2 Completions Completions Reward Completions 2 1 3 [0,1] [1,0] [1,0] [1,0] [1,0] [1,0] Reward Place the preferred option first by reordering completions. Rank Assign 1 for the preferred response and 0 for the rejected one response in each pair. RM (Prompt x, Completion yj ) (Prompt x, Completion yk ) Reward rj Reward rk loss = log( (rj -r k ) RM RM Samantha enjoys reading books Positive Negative 3.17 -2,6 Logits (Prompt x, Completion y)
  • 6. Preference Fine-Tuning (Part 2) PPO ALGORITHM FOR LLMS REWARD HACKING RL FROM AI FEEDBACK FINETUNING WITH RL REWARD MODEL The LLM weights are updated to create a human-aligned model via reinforcement learning, leveraging the reward model, and starting with a high-performing base model. Goal: To align the LLM with provided instructions and human behavior. As the process advances successfully, the reward will gradually increase until it meets the predefined evaluation criteria for helpfulness. Updated model: The resulting updated model should be more aligned with human preferences. Reinforcement learning algorithm: Proximal policy optimization (PPO) is a popular choice. Example: Prompt: A tree is... Iteration 1: ...a plant with a trunk. → Reward: 0.3 … Iteration 4: ...a provider of shade and oxygen. → Reward: 1.6 … Iteration n: ...a symbol of strength and resilience. → Reward: 2.9 PPO iteratively updates the policy to maximize the reward, adjusting the LLM weights incrementally to maintain proximity to the previous version within a defined range for stable learning. The PPO objective is used to update the LLM weights by backpropagation: The agent learns to cheat the system by maximizing rewards at the expense of alignment with desired behavior. Value Loss: Minimize it to improve return prediction accuracy. Policy Loss: Maximize it to get higher rewards while staying within reliable bounds. Entropy Loss: Maximize it to promote and sustain model creativity. The higher the entropy, the more creative the policy. Obtaining the reward model is labor-intensive; scaling through AI-supervision is more precise and requires fewer human labels. Constitutional AI (Bai, Yuntao, et al., 2022) Approach that relies on a set of principles governing AI behavior, along with a small number of examples for few-shot prompting, collectively forming the “constitution.” Example of constitutional principle: “Please choose the response that is the most helpful, honest, and harmless.” To prevent reward hacking, penalize RL updates if they significantly deviate from the frozen original LLM, using KL divergence. Updated LLM Prompt N iterations 1 1: Text Generation 2: Scoring 3: Model weights update with reinforcement learning. 2 3 Answer Reinforcement learning Scores RM: Reward Model Hyperparameters Policy loss Value loss Entropy loss Prompt RL updated LLM “The movie was...” “...an absolute thrill fest that left me breathless!” Value loss Estimated future total reward Value function Actual Reward from the reward model Model’s probability distribution over tokens Probabilities of the next token with the updated LLM Probabilities of the next token with the initial LLM Advantage term Define “trust region” Guardrails Keeping the policy in the “trust region” RL updated LLM Original LLM RM Prompt “The movie was...” “... enjoyable and decent” “... thrilling and unforgettable...” PPO KL divergence Shift penalty KL penalty added in reward DIRECT PREFERENCE OPTIMIZATION An RLHF pipeline is difficult to implement: • Need to train a reward model • New completions needed during training • Instability of the RL algorithm Direct Preference Optimization (DPO) is a simpler and more stable alternative to RLHF. It solves the same problem by minimizing a training loss directly based on the preference data (without reward modeling or RL). Identity Preference Optimization (IPO) is a variant of DPO less prone to overfitting. Comparison data DPO (or IPO) Fine tuned LLM 1. Supervised Learning Stage 2. Reinforcement Learning (RL) Stage - RLAIF Helpful LLM Fine- tuned LLM Harmful prompts, completions Harmful prompts, revised completions Critique and revise responses based on constitutional principles Fine-tune a pre-trained LLM 1 2 3 Fine- tuned LLM Preference model Ask which response is best based on constitutional principles Fine-tune the LLM using RL against the preference model Train a preference model Harmful prompts, pair of completions AI-generated comparison data 4 5 6 7 + human feedback helpfulness data Result: A policy trained by Reinforcement Learning with AI Feedback (RLAIF)
  • 7. LLM-Powered Applications LLMINTEGRATED APPLICATIONS LLM REASONING WITH CHAINOFTHOUGHT PROMPTING PROGRAMAIDEDLANGUAGEREACT MODEL OPTIMIZATION FOR DEPLOYMENT • Scale down model complexity while preserving accuracy. • Train a small student model to mimic a large frozen teacher model. • Knowledge can be out of date. • LLMs struggle with certain tasks (e.g., math). • LLMs can confidently provide wrong answers (hallucination). • Soft labels: Teacher completions serve as ground truth labels. • Student and distillation losses update student model weights via backpropagation. • The student LLM can be used for inference. Leverage external app or data sources LLM should serve as a reasoning engine. The prompt and completion are important! ModelDistillation • Prompts the model to break down problems into sequential steps. • Operates by integrating intermediate reasoning steps into examples for one-or few-shot inference. Chain-of-Thought(CoT) PTQ reduces model weight precision to 16-bit float or 8-bit integer. • Can target both weights and activation layers for impact. • May sacrifice performance, yet beneficial for cost savings and performance gains. Complex reasoning is challenging for LLMs. E.g., problems with multiple steps, mathematical reasoning In the completion, the whole prompt is included. • We retrieve documents most similar to the input query in the external data. • We combine the documents with the input query and send the prompt to the LLM to receive the answer. PostTrainingQuantization(PTQ) Removes redundant model parameters that contribute little to the model performance. Some methods require full model training, while others are in the PEFT category (LoRA). Size of the context window can be a limitation. ModelPruning RetrievalAugmentedGeneration(RAG) AI framework that integrates external data sources and apps (e.g., documents, private databases, etc.). Multiple implementations exist, will depend on the details of the task and the data format. ReAct Prompting strategy that combines CoT reasoning and action planning, employing structured examples to guide an LLM in problem-solving and decision-making for solutions Prompt Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5+6=11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Completion A: The cafeteria had 23 apples. They used 20 to make lunch. 23-20=3. They bought 6 more apples, so 3+6=9. The answer is 9. Use multiple chunks (e.g., with LangChain) Shrink model size, maintain performance Improves performance but struggles with precision-demanding tasks like tax computation or discount application. Inference challenges: High computing and storage demands LLM Student Knowledge distillation Soft labels Soft predictions Hard predictions Hard labels LLM Teacher Distillation loss Student loss Labeled training data Retriever User Query External knowledge Query encoder LLM Answer LLM User Ext data sources Ext applications API Python LLM-integrated application Orchestrator Frontend E.g., ! ! Data must be in format that allows its relevance to be assessed at inference time. Use embedding vectors (vector store) Vector database: Stores vectors and associated metadata, enabling efficient nearest-neighbor vector search. Solution: Allow the LLM to communicate with a proficient math program, as a Python interpreter. ReAct reduces the risks of errors. LangChain can be used to connect multiple components through agents, tools, etc. Agents: Interpret the user input and determine which tool to use for the task (LangChain includes agents for PAL ReAct). 1.Plan actions 2.Format outputs 3.Validate actions Set of instructions Step1: Get customer ID Step2: Reset password Requires formatting for applications to understand actions Collect information that allows validation of an action Program-AidedLanguage(PAL) Generate scripts and pass it to the interpreter. Completion is handed off to a Python interpreter. Calculations are accurate and reliable. Prompt Q: Roger has 5 tennis balls. [...] A: # Roger started with 5 tennis balls tennis_balles=5 # 2 cans of tennis balls each is bought_balls=2*3 # tennis balls. The answer is answer = tennis_balls + bought_balls Q. [...] CoT reasoning PALexecution Instructions Question Thought Action Observation Question to be answered Thought: Analysis of the current situation and the next steps to take Instructions: Define the task, what is a thought and the actions Action: The actions are from a predetermined list and defined in the set of instructions in the prompt The loop ends when the action is finish [] Observation: Result of the previous action