A simple guide to fine-tuning Llama 2

Sam L'Huillier, Updated by Harper Carroll

July 24, 2023, Last Updated September 26, 20236 min read

In this guide, I show how you can fine-tune Llama 2 to be a dialog summarizer!

Last weekend, I wanted to finetune Llama 2 (which now reigns supreme in the Open LLM leaderboard) on a dataset of my own collection of Google Keep notes; each one of my notes has both a title and a body so I wanted to train Llama to generate a body from a given title.

This first part of the tutorial covers finetuning Llama 2 on the samsum dialog summarization dataset using Huggingface libraries. I tend to find that while Huggingface has built a superb library in transformers, their guides tend to overcomplicate things for the average joe. The second part, fine-tuning on custom data, is here!

To get started, use this one-click link to get yourself either an L4, A10, A100 (or any GPU with >24GB GPU memory)..

Build your Verb container:

Once you've checked out your machine and landed in your instance's page, select Python 3.10 and CUDA 12.0.1 and click the "Build" button to build your Verb container. Give this a few minutes.

Open your new Brev Notebook:

Once the Verb container is finished loading, click the 'Notebook' button on the top right of your screen once it illuminates. You will be taken to a Jupyter Lab environment. Under "Other", click "Terminal". Run the following commands.

Note you can also ssh into the development environment and run the commands below from there by running brev open [your-machine-name] (to enter via VSCode) or brev shell [your-machine-name] (to enter via shell). Note that for these, you will need to have the Brev CLI installed; you can install it here.

You can also download zsh for Jupyter Notebook, where you can run the following commands in cells, via pip install zsh-jupyter-kernel in the Terminal.

1. Download the model

Clone Meta's Llama inference repo (which contains the download script):

git clone

Then run the download script:

cd llama

It'll prompt you to enter the URL you got sent by Meta in an email. If you haven't signed up, do it here. They are surprisingly quick at sending you the email!

For this guide, you only need to download the 7B model.

2. Convert model to Hugging Face format

pip install git+
pip install protobuf accelerate bitsandbytes scipy
pip install -e .
python \
    --input_dir llama-2-7b --model_size 7B --output_dir llama-2-7b/7B

If you originally only downloaded the 7B model, you need to make sure you move the model files into a directory called 7B. You may also need to move the tokenizer* files into llama-2-7b. Use this structure for your directory llama-2-7b:

├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

This now gives us a Hugging Face model that we can fine-tune leveraging Huggingface libraries!

3. Run the fine-tuning notebook:

Clone the Llama-recipies repo:

git clone

and run

mv llama-recipes/examples/quickstart.ipynb llama-recipes/src/llama-recipes

Then using the file navigator on the left, navigate to quickstart.ipynb inside llama-recipes. Uncomment the pip install command in the first cell, and run it to install more requirements, and then restart the kernel. Then, run the rest of the notebook.

In the notebook, add a cell, and insert & run:

import os

In Step 1, change the line:




and in Step 2, change the lines

from utils.dataset_utils import get_preprocessed_dataset
from configs.datasets import samsum_dataset


from llama_recipes.utils.dataset_utils import get_preprocessed_dataset
from llama_recipes.configs.datasets import samsum_dataset

And that's that! You will end up with a Lora fine-tuned, and in Step 8, you can run inference on your fine-tuned model.

Next in this series, I'll show you how you can format your own dataset to train Llama 2 on a custom task!

A simple guide to fine-tuning Llama 2 on your own data