5️⃣Axoltl Fine-tuning with QLoRA

About Axolotl

Axolotl은 CLI로 간단하게 LLM을 Fine-tuning 할 수 있습니다. 다양한 AI 모델의 미세 조정을 간소화하도록 설계된 도구로, 여러 구성과 아키텍처를 지원합니다.

GitHub - axolotl-ai-cloud/axolotl: Go ahead and axolotl questionsGitHub

Features:

Llama, Gemma, Mixtral 등 다양한 Huggingface Model을 Fine-tune 가능
Fine-tune, LoRA, QLoRA, ReLoRA, gptq 등을 지원
간단한 yaml 파일 또는 CLI 덮어쓰기를 사용하여 구성 사용자 지정
다양한 데이터 세트 형식 로드, 사용자 지정 형식 사용, 또는 자체 토큰화된 데이터 세트 가져오기
xformer, flash attention, rope scaling, multipacking을 통합하여 지원
FSDP 또는 Deepspeed를 통해 단일 GPU 또는 여러 GPU와 함께 작동
로컬 또는 클라우드에서 Docker로 손쉽게 실행 가능
결과 및 선택적으로 체크포인트를 wandb(Weight & Bias 계정 연동)에 로그 기록

실행은 Command Line에서 직접, Jupyter Notebook이나 Colab에서 실행 가능합니다.

`Axoltl` Fine-tuning Q-LoRA Tutorial

import torch
assert (torch.cuda.is_available()==True)

Install Axolotl & dependencies

Axotl github에서 repository를 불러와서 설치를 합니다. 또한 Acceleration과 Optimize 하는 라이브러리를 함께 설치합니다.

%pip install torch=="2.1.2"
%pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl

%cd axolotl
%pip install flash-attn=="2.5.0"
%pip install deepspeed=="0.13.1"
%pip install mlflow=="2.13.0"

Create yaml config file

Configuration yaml 파일을 만듭니다. yaml 파일에 Model, Dataset, Accelerate, Flash Attention, Peft, LoRA 등을 모두 설정 가능합니다. 여기서는 아래와 같이 설정했습니다.

base_model: google/gemma-2b-it
Dataset: nlpai-lab/databricks-dolly-15k-ko
Adapter: QLoRA

import yaml

# Your YAML string
yaml_string = """
# use google/gemma-7b if you have access
base_model: google/gemma-2b-it
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

# huggingface repo
datasets:
  - path: nlpai-lab/databricks-dolly-15k-ko
    type: alpaca
val_set_size: 0.1
output_dir: ./outputs/out

adapter: qlora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:


gradient_accumulation_steps: 3
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:


"""

# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(yaml_string)

# Specify your file path
file_path = 'gemma-2b_axolotl.yaml'

# Write the YAML file
with open(file_path, 'w') as file:
    yaml.dump(yaml_dict, file)

Launch the training

이제 간단한 CLI 명령어로 이미 설정한 gemma-2b_axolotl.yaml 을 지정해서 Fine-tuning 학습을 진행합니다.

!accelerate launch -m axolotl.cli.train gemma-2b_axolotl.yaml

학습이 완료되면 yaml에서 지정한 output_dir: ./outputs/out 경로에 Fine-tuning 결과가 저장됩니다. 이를 Huggingface에서 push 가능합니다.

Inference

마찬가지로 Inference를 지정하여 실행 가능합니다. --gradio로 설정하면 gradio가 자동으로 모델을 실행하게 됩니다.

!accelerate launch -m axolotl.cli.inference gemma-2b_axolotl.yaml \
    --qlora_model_dir="./qlora-out" --gradio

PreviousPEFT: Fine-tuning Phi-2 with QLoRA NextTRL: RLHF Alignment Fine-tuning

Last updated 1 year ago

hashtagAbout Axolotl

hashtagFeatures:

hashtagAxoltl Fine-tuning Q-LoRA Tutorial

hashtagInstall Axolotl & dependencies

hashtagCreate yaml config file

hashtagLaunch the training

hashtagInference