Run Llama 2 with Brev

In this guide, we show you how you can run Llama 2 with Brev! As always, we've pre-configured the environment for you so you won't have to do any config at all!

If you're looking for a fine-tuning guide, check out our Llama 2 fine-tuning post.

  1. Sign up for an account
  2. It'll redirect you to an instance creation page pre-configured with the defaults you need.
  3. Open the "Get Software" card. (We have a minor bug with environment variables so you need to open this card to set them)
  4. At the bottom, add a payment method...we give you 30 minutes free but we just need to make sure people don't abuse our systems 🙂
  5. Hit create!

Open your new Brev instance:

brev open llama2

If you don't have the Brev CLI, you can install it here.

Running the model:

Once VSCode opens, run this:


This will prompt you to enter the URL you got sent by Meta in an email. If you haven't signed up, do it here. They are surprisingly quick at sending you the email!

Then refresh the shell with:


And run the 7B completion model with:

torchrun --nproc_per_node 1 \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

To run the 7B-chat model, run:

torchrun --nproc_per_node 1 \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Happy LLMing!