In this guide, I show how you can fine-tune Llama 2 to be a dialog summarizer!
Last weekend, I wanted to finetune Llama 2 (which now reigns supreme in the Open LLM leaderboard) on a dataset of my own collection of Google Keep notes; each one of my notes has both a title and a body so I wanted to train Llama to generate a body from a given title.
This first part of the tutorial covers finetuning Llama 2 on the samsum dialog summarization dataset using Huggingface libraries. I tend to find that while Huggingface has built a superb library in transformers, their guides tend to overcomplicate things for the average joe. The second part, fine-tuning on custom data, is here!
To get started, use this one-click link to get yourself either an L4, A10, A100 (or any GPU with >24GB GPU memory)..
Build your Verb container:
Once you've checked out your machine and landed in your instance's page, select Python 3.10 and CUDA 12.0.1 and click the "Build" button to build your Verb container. Give this a few minutes.
Open your new Brev Notebook:
Once the Verb container is finished loading, click the 'Notebook' button on the top right of your screen once it illuminates. You will be taken to a Jupyter Lab environment. Under "Other", click "Terminal". Run the following commands.
Note you can also ssh into the development environment and run the commands below from there by running
brev open [your-machine-name] (to enter via VSCode) or
brev shell [your-machine-name] (to enter via shell). Note that for these, you will need to have the Brev CLI installed; you can install it here.
You can also download zsh for Jupyter Notebook, where you can run the following commands in cells, via
pip install zsh-jupyter-kernel in the Terminal.
1. Download the model
Clone Meta's Llama inference repo (which contains the download script):
git clone https://github.com/facebookresearch/llama.git
Then run the download script:
It'll prompt you to enter the URL you got sent by Meta in an email. If you haven't signed up, do it here. They are surprisingly quick at sending you the email!
For this guide, you only need to download the 7B model.
2. Convert model to Hugging Face format
pip install git+https://github.com/huggingface/transformers
pip install protobuf accelerate bitsandbytes scipy
pip install -e .
python convert_llama_weights_to_hf.py \
--input_dir llama-2-7b --model_size 7B --output_dir llama-2-7b/7B
If you originally only downloaded the 7B model, you need to make sure you move the model files into a directory called
7B. You may also need to move the
tokenizer* files into
llama-2-7b. Use this structure for your directory
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ └── params.json
This now gives us a Hugging Face model that we can fine-tune leveraging Huggingface libraries!
3. Run the fine-tuning notebook:
Clone the Llama-recipies repo:
git clone https://github.com/facebookresearch/llama-recipes.git
mv llama-recipes/examples/quickstart.ipynb llama-recipes/src/llama-recipes
Then using the file navigator on the left, navigate to quickstart.ipynb inside
llama-recipes. Uncomment the
pip install command in the first cell, and run it to install more requirements, and then restart the kernel. Then, run the rest of the notebook.
In the notebook, add a cell, and insert & run:
In Step 1, change the line:
and in Step 2, change the lines
from utils.dataset_utils import get_preprocessed_dataset
from configs.datasets import samsum_dataset
from llama_recipes.utils.dataset_utils import get_preprocessed_dataset
from llama_recipes.configs.datasets import samsum_dataset
And that's that! You will end up with a Lora fine-tuned, and in Step 8, you can run inference on your fine-tuned model.
Next in this series, I'll show you how you can format your own dataset to train Llama 2 on a custom task!