AI

A simple guide to fine-tuning Llama 2

Sam L'Huillier

July 24, 20236 min read

In this guide, I show how you can fine-tune Llama 2 to be a dialog summarizer!

Last weekend, I wanted to finetune Llama 2 (which now reigns supreme in the Open LLM leaderboard) on a dataset of my own collection of Google Keep notes; each one of my notes has both a title and a body so I wanted to train Llama to generate a body from a given title.

This first part of the tutorial covers finetuning Llama 2 on the samsum dialog summarization dataset using Huggingface libraries. I tend to find that while Huggingface has built a superb library in transformers, their guides tend to overcomplicate things for the average joe. The second part, fine-tuning on custom data, is here!

To get started, get yourself either an L4, A10, A100 (or any GPU with >24GB GPU memory). If you're not sure where to start, the Brev Cloud makes it easy to access each of these GPUs!

1. Download the model

Clone Meta's Llama inference repo (which contains the download script):

git clone https://github.com/facebookresearch/llama.git

Then run the download script:

bash download.sh

It'll prompt you to enter the URL you got sent by Meta in an email. If you haven't signed up, do it here. They are surprisingly quick at sending you the email!

For this guide, you only need to download the 7B model.

2. Convert model to Hugging Face format

wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
pip install git+https://github.com/huggingface/transformers
pip install -e .
python convert_llama_weights_to_hf.py \
    --input_dir llama-2-7b --model_size 7B --output_dir models_hf/7B

If you originally only downloaded the 7B model, you need to make sure you move the model files into a directory called "7B". This is the structure of my directory:

llama-2-7b/
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

This now gives us a Hugging Face model that we can fine-tune leveraging Huggingface libraries!

3. Run the fine-tuning notebook:

Clone the Llama-recipies repo:

git clone https://github.com/facebookresearch/llama-recipes.git

Then open the quickstart.ipynb file in your preferred notebook interface:

(I use Jupyter lab like so):

pip install jupyterlab
jupyter lab # in the repo you want to work in

Then just run the whole notebook.

Make sure you change the line:

model_id="./models_hf/7B"

to your actual model path that you converted. And that's that! You will end up with a Lora fine-tuned.

4. Run inference on your fine-tuned model

The issue here is that Huggingface only saves the adapter weights and not the full model. So we need to load the adapter weights into the full model. I struggled for a bit finding the right documentation to do this...But eventually worked it out!

Import libraries:

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig

Load the tokenizer and model:

model_id="./models_hf/7B"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

Load the adapter from where you saved it post-train:

model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

Run inference:

eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Next in this series, I'll show you how you can format your own dataset to train Llama 2 on a custom task!

Previous
A simple guide to fine tuning Llama 2 on your own data