TV Show Script Generation
Notes for NLP project~
Questions and Solutions
1. Llama2 Model takes more than 40G GPU RAM to train.
Solution:
LoRA: Low-Rank Adaptation of Large Language Models
Concepts:
Instead of updating all params in the model, we add and update a low rank matrix to some weights matrixes (e.g.: Q, K, V matrixes in transformers)
The original weights matrix will be frozen and the low rank matrix will be trained and added to the weights matrix.
Method:
Use LoraConfig
to add a LoRA adapter to the model. LoraConfig
is integrated in peft
by HuggingFace.
1 | from peft import get_peft_model, LoraConfig, TaskType |
2.
Reference
“LoRA: Low-Rank Adaptation of Large Language Models”
“In self-attention layers, we only apply LoRA to the query and value projection matrices (i.e., W_q and W_v), since modifying the key matrix W_k is less helpful empirically.”
target_modules=["q_proj", "v_proj"]
$\Delta W = BA$
“For GPT-2 models, we set rank r=8 and scaling α=32 by default, unless otherwise stated.”
$r = 8, \alpha = 32$