Overview of LLM Prompting Techniques

The other day I came across Andrej Karpathy’s tweet and thought he’d managed to succinctly describe how LLMs behave.

# On the "hallucination problem"

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

We direct their dreams with prompts. The prompts start the dream, and based on the…
— Andrej Karpathy (@karpathy) December 9, 2023

This sentence distills the root of the LLM behaviour…

The prompts start the dream, and based on the LLM’s hazy recollection of its training documents, most of the time the result goes someplace useful.

… which shows how LLMs are tools that need to be appropriately guided to produce the outputs we want.

In this blog post, we’ll take a look at some prompting techniques to get better outputs from LLMs, and also understand when to use each technique.

Note that this is a living document, and I’ll continue adding new prompting techniques as I discover and play with them.

Prompting

Prompting involves specifying instructions to an LLM to get it to perform a task. Prompting is done in plain English and forms a part (or whole) of the input to the LLM.

Depending on the task and the LLM’s subsequent performance, we might find that simple instructions do not give us the response we’re looking for, or, that the LLM has returned a completely wrong response.

To address these issues + constantly arising new ones, the NLP field has a wide variety of prompting techniques ranging from simple to fairly advanced. Let’s take a look at them and implement some.

Coding simultaneously

I’d recommend trying these techniques out as you read them to cement the concept with practical implementation. Moreover you can play around with the prompts yourself and elicit different responses from the LLM.

I’m running these examples on Google Colab, and if you’ve got a Colab Pro or higher subscription, you can play with larger LLM models by connecting to a V100 or A100 GPU.

If you don’t have access to these GPUs, no worries you can still learn alongside by connecting to a T4 GPU and implementing quantization (stay tuned 😬)

Access the Colab notebook here.

Setup

Here’s the Colab notebook containing all steps in the code. I’ll be focussing on the concepts + some key code blocks here.

Install the libraries we’ll be using.

%%bash
pip install -U transformers==4.36.1 accelerate bitsandbytes --quiet

Download the model and setup the tokenizer. We will be using the Dolly 2 7B model released by the Databricks team.

 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch

 model_name = "databricks/dolly-v2-7b"

 tokenizer = AutoTokenizer.from_pretrained(model_name)

 model = AutoModelForCausalLM.from_pretrained(model_name,
                                             load_in_8bit = False,
                                             torch_dtype = torch.bfloat16,
                                             device_map = "auto",
                                             trust_remote_code = True)

Note that here I’m not applying any quantization technique as I used the A100 GPU on Colab. If you don’t have access to more expensive GPUs, you can set load_in_8bit = True or use smaller models.

Setup a text generation pipeline to prompt the LLM. HuggingFace Transformers Pipeline abstracts away the common steps involved in NLP tasks, such as encoding text, creating batches of data, as well as decoding the LLM’s output back to text. You pass your prompt as input to the pipeline, and get the output back.

  from transformers import pipeline

  pipe = pipeline("text-generation",
                model = model,
                tokenizer = tokenizer,
                device_map = "auto",
                torch_dtype = torch.bfloat16)

Setup a function that takes as input a prompt and generates the answer. We will make use of the pipeline we set up in the step above.

  def generate_answer(prompt: str, pipe: pipeline = pipe) -> str:
    """
    Prompt the text-generation pipeline to generate a response.

    Args:
      prompt (str): prompt
      pipe (pipeline): Transformers text-generation pipeline consisting of the LLM.

    Return:
      answer (str): LLM response to the prompt
    """
    sequences = pipe(prompt,
                    do_sample = True, 
                    return_full_text = False,
                    temperature = 0.2,
                    num_return_sequences = 1)

    answer = sequences[0]['generated_text']

    return answer

Let me explain in brief the parameters I set in the function above. Take a look at the expansive list of parameters for the pipeline’s __call__() method here.

1. do_sample = True: This setting enables using decoding strategies such as multinomial sampling, Top-k sampling etc., while generating the response.

2. return_full_text = False: When set to False, the input prompt will not be included in the pipeline response; that is only the LLM's response will be returned.

3. temperature = 0.2: This parameter controls the (amount of) creativity the LLM uses to generate responses. Values closer to 0 will result in more deterministic and constrained generations while higher values will result in more creative answers.

4. num_return_sequences = 1: Number of sequence candidates to return for each input.

We’ve finished setting up some common functions we’ll use throughout this article. Let’s take a look at the prompting techniques.

Prompting Techniques

i. Zero-shot prompting

In zero-shot prompting, you describe the task the LLM has to perform without giving it any examples of how to perform that task or what the output is supposed to look like.

  zs_prompt = """Identify whether the text is Negative or Positive.

  Text: I visited a new restaurant last week and it'll be my first and last time eating there. I would not even rate it a 1 star, the food and hygiene was awful, and the staff behaved extremely rude to us."
  Sentiment:"""

  zs_answer = generate_answer(zs_prompt)
  print(zs_answer)

LLM’s response:

Negative

ii. Few-shot prompting

In few-shot prompting, you include a few example pairs of instruction + output, which show the LLM what the task is and how the output looks like.

Few-shot prompting enables in-context learning which is where the LLM learns to perform a task given a few examples belonging to the task.

While playing with a few different few-shot prompts and a couple of open-source models, I noticed that it was quite tricky to get these open-source LLMs to generate clear and complete answers.

I had to craft my prompts carefully based on the format used to train each LLM. I found that the below prompt template was used to train (rather instruction fine-tune) Dolly, and so used the same template in this few-shot example.

Prompt for Dolly:

  Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  Describe the task in detail, as well as the "role" the LLM is supposed to adopt. Be sure to explicitly tell the LLM what *not* to do in the instructions.

  ### EXAMPLES
  1. Example 1
    Output = <response_to_the_task>
  
  2. Example 2
    Output = <response_to_the_task>

  3. Test sentence: 

  ### Response:
  Output =

  ### End

Few-shot prompt:

  INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
  INSTRUCTION_KEY = "### Instruction:"
  INPUT_KEY = "Input:"
  RESPONSE_KEY = "### Response:\nSentiment="
  END_KEY = "### End"

  example_string = f"""
    You will be given a few example sentences with their correct sentiment. Learn to identify the sentiment of the text and predict whether it is Negative, Positive or Neutral.
    Only respond with the sentiment label.

    ### EXAMPLES
    1. This is awesome!
    Sentiment = Positive

    2. This is bad!
    Sentiment = Negative

    3. Well not much to speak about, the museum was average.
    Sentiment = Neutral

    4. That was an awesome class and an even cooler teacher. I liked how she was patient and was good at explaining things. Definitely will take the class again.
    """

Set up the rest of the prompt.

PROMPT_FOR_GENERATION_FORMAT = """{intro}

{instruction_key}
{instruction}

{response_key}

{end_key}
""".format(intro = INTRO_BLURB,
           instruction_key = INSTRUCTION_KEY,
           instruction = example_string,
           response_key = RESPONSE_KEY,
           end_key = END_KEY)

This is what the complete prompt looks like:

  Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:

  You will be given a few example sentences with their correct sentiment. Learn to identify the sentiment of the text and predict whether it is Negative, Positive or Neutral. 
  Only respond with the sentiment label.

  ### EXAMPLES
  1. This is awesome!
  Sentiment = Positive

  2. This is bad! 
  Sentiment = Negative

  3. Well not much to speak about, the museum was average.
  Sentiment = Neutral

  4. That was an awesome class and an even cooler teacher. I liked how she was patient and was good at explaining things. Definitely will take the class again.

  ### Response:
  Sentiment = 

  ### End

And to get the output…

print(generate_answer(PROMPT_FOR_GENERATION_FORMAT))

Output =

Positive

Now repeat the same exercise with a negative sentence:

Setup the prompt

  example_2 = f"""
    You will be given a few example sentences with their correct sentiment. Learn to identify the sentiment of the text and predict whether it is Negative, Positive or Neutral.
    Only respond with the sentiment label.

    ### EXAMPLES
    1. This is awesome!
    Sentiment = Positive

    2. This is bad!
    Sentiment = Negative

    3. Well not much to speak about, the museum was average.
    Sentiment = Neutral

    4. Hmm this was terrible. What a waste of my time!

    """

  PROMPT_FOR_GENERATION_FORMAT = """{intro}

  {instruction_key}
  {instruction}

  {response_key}

  {end_key}
  """.format(intro = INTRO_BLURB,
            instruction_key = INSTRUCTION_KEY,
            instruction = example_2,
            response_key = RESPONSE_KEY,
            end_key = END_KEY)

This is what the complete prompt looks like:

  Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:

  You will be given a few example sentences with their correct sentiment. Learn to identify the sentiment of the text and predict whether it is Negative, Positive or Neutral.
  Only respond with the sentiment label.

  ### EXAMPLES
  1. This is awesome!
  Sentiment = Positive

  2. This is bad!
  Sentiment = Negative

  3. Well not much to speak about, the museum was average.
  Sentiment = Neutral

  4. Hmm this was terrible. What a waste of my time!


  ### Response:
  Sentiment = 

  ### End

Generate the response:

print(generate_answer(PROMPT_FOR_GENERATION_FORMAT))

And the output:

Negative

Note that you may end up getting responses different from mine due to the random nature of the LLM (despite the temperature setting). You may also end up with different responses each time you run the same code.

Prompt techniques like zero-shot or few-shot prompting can allow you to quickly play around with LLMs and test your ideas without having to spend weeks on an NLP project.

However, these techniques are fairly simple and if you want to test LLMs on a task that’s useful, interesting or complex you may find that you need to invest time in either fine-tuning the LLMs or explore more advanced prompting techniques.

iii. Chain-of-Thought [CoT] prompting

Chain-of-Thought prompting might be a useful technique to improve an LLM’s performance on tasks that involve complex reasoning to be performed before generating a response.

This technique involves breaking down the complex task into smaller or intermediate reasonong steps while writing the prompt. CoT is usually combined with few-shot prompting to give the LLM more examples of the reasoning chains involved before answering.

We will look at Chain of Thought prompting and Auto Chain of Thought prompting next. Stay tuned (:

Thanks for reading & Happy Programming!