Prompt Engineering

3 minute read

This post contains survey of notable prompt engineering techniques.

Prompt engineering refers to a way to construct a natural language in such a way to feed to the LLM in order to improve the performance. In this domain, the assumption is that the LLM is a blackbox therefore cannot be directly optimized, so the improvement must come from the prompt or other external parameters.

There are abundant amount of prompt techniques but the trade-off is usually between the performance improvement and the cost, e.g. inference time, numer of tokens, cost of using the LLM, additional models, etc.

Directly feeding the original prompt without any modifications to the LLM is refered to as zero-shot strategy. It is fast, easy, but the performance is not good on complex task.

Expanding the Prompt

Few-Shot / In-Context Learning

Give the LLM several demonstrations of the expected interactions before giving the actual prompt. fewshot

Generate Knowledge

genknow

Generate a “knowledge” that is related to the original prompt
Append the knowledge as additional context to the original prompt to generate the final answer

Retrieval Augmented Generation (RAG)

LLMs are trained with a knowledge cutoff, making more recent information unavailable to access
However, it can perform a zero-shot reasoning considerably well given enough context

genknow

The idea is to create vectors from chunks of documents, i.e. the source of information
For each user prompt, it will be match with the top-K most similar document chunks and combined to give additional context

Expanding the LLM Generation

LLM models are autoregressive, so they generate one token at a time. During the generation process, they consider past tokens to get the next token.

The idea of expanding the LLM generation works because instead of giving the final answer to the prompt directly as a short answer, it generates more token, giving more context for the next token generation, thus resulting in better answer overall.

Chain-of-Thought

cot1

There are also many variations introduced that were derived from this techniques.

cot2

Tree-of-Thought

Similar with CoT, but it works by generating intermediate answer with intermediate reasoning
Similar to how DFS or BFS works, it generates a tree of reasoning and pursue the most likely path to get the best answer

tot

Self-Consistency

It generates the answer for the same prompt multiple times, and select the final answer based on the majority vote (the most consistent).

ReAct

react

Similar with CoT, but it also integrates acting, reasoning, and observation
It generates the next step based on the reasoning and observation of previous steps
It enables LLM to interact with external tools that generate the observation
It also makes the LLM able to adapt each step depending on the observation

RCI

rci

Introduce “retry” mechanism by finding the problems and improve upon it

Reflexion

reflexion1

reflexion2

There are 3 components:

Actor, the LLM that generates the response conditioned by the prompt and observation
Evaluator, the one that evaluates the quality of the generation
Self-reflection, another LLM that generates reinforcement cues to assist the Actor in self-improvement

Prompt Optimization

Previous techniques tries to develop a general algorithm/framework to improve the LLM generation. The following techniques works to develop a prompt that works well on specific queries.

OIRL

oirl

It works by creating an offline reward model to score the prompt. However, direct optimization of this reward model requires the parameter of the LLM, which might not always be available. Even if available, it will be very costly as the LLM contains billions of parameters.
Therefore, a proxy reward model is introduced by training additional model with a supervised learning objective to minimize the discrepancies with the true reward.

OPRO

opro

This algorithm works by prompting the model to “generate a new instruction that achieves a higher accuracy” given trajectories of past solutions paired with their optimization scores, sorted in ascending order
The model has to pick up the pattern to generate a better solution