Finetuning T5 large with QLoRA on XSUM dataset

Overview

This guide provides a detailed walkthrough for finetuning T5 Large model on the xsum Dataset with QLoRA using nyuntam-adapt. QLoRA is a PEFT technique where the original weights are frozen to reduce the trainable parameters and and qre quantized to reduce the memory usage.

Introduction

In this example we will be finetuning a T5 large model for text summarization on the xsum dataset using QLoRA. QLoRA (Quantized LoRA) allows us to finetune a large model with a small memory requirement by freezing and quantizing the original model weights and only training the LoRA adapters. The adapters are then merged while saving the model.

Requirements

Before you begin, ensure that you have the following: - A GPU-enabled environment with CUDA support. - The Nyuntam repository cloned and set up as per the Installation Guide. - Docker

Installation

Step 1: Clone the Nyuntam Repository

Clone the repository and navigate to the nyuntam directory:

$ git clone https://github.com/nyunAI/nyuntam.git
$ cd nyuntam

Step 2: Set Up the workspace

To setup the environment use the following command(s),

pip install git+https://github.com/nyunAI/nyunzero-cli.git
nyun init {WORKSPACE_PATH} -e adapt

Dataset

XSum Dataset is used for this example. The dataset is directly used from 🤗huggingface.

Sample : | document | summary | id | |--------------------------------------------------------------------------|--------------------------------------------------------------------------|-----------| | The full cost of damage in Newton Stewart, one of the areas worst affec… | Clean-up operations are continuing across the Scottish Borders and Dumf… | 35232142 | | A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 … | Two tourist buses have been destroyed by fire in a suspected arson atta… | 40143035 | | Ferrari appeared in a position to challenge until the final laps, when … | Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead… | 35951548 | | John Edward Bates, formerly of Spalding, Lincolnshire, but now living i… | A former Lincolnshire Police officer carried out a series of sex attack… | 36266422 | | Patients and staff were evacuated from Cerahpasa hospital on Wednesday … | An armed man who locked himself into a room at a psychiatric hospital i… | 38826984 |

Configuration

The following YAML file is used for setting up the experiment :

JOB_SERVICE : Adapt
JOB_ID: SUMM
TASK : Seq2Seq_tasks
subtask : summarization
max_input_length : 512
max_target_length : 128
eval_metric : 'rouge' 
cuda_id : '0'
OUTPUT_DIR : "/user_data/jobs/Adapt/SUMM"
OVERWRITE_OUTPUT_DIR : False
LOGGING_PATH: "/user_data/logs/Adapt/SUMM" 
packing : True
dataset_text_field : 'text' 
max_seq_length : 512
flash_attention2 : false
blocksize : 128
SAVE_METHOD : 'state_dict'


# DATASET_ARGS :
DATASET : 'EdinburghNLP/xsum'
DATA_VERSION : '1.0'
MAX_TRAIN_SAMPLES : 1000
MAX_EVAL_SAMPLES : 1000
DATASET_CONFIG : {}
input_column : 'document'
target_column : 'summary'

# MODEL_ARGS :
MODEL : "t5"
MODEL_PATH :  'google-t5/t5-large'
MODEL_VERSION : '1.0'
CACHE_BOOL : False

# TRAINING_ARGS :
SEED : 56
DO_TRAIN : True
DO_EVAL : True
NUM_WORKERS : 4
BATCH_SIZE : 16
EPOCHS : 1
STEPS : 1
OPTIMIZER : 'adamw_torch'
LR : 1e-4
SCHEDULER_TYPE : 'linear'
WEIGHT_DECAY : 0.0
BETA1 : 0.9
BETA2 : 0.999
ADAM_EPS : 1e-8 
INTERVAL : 'epoch'
INTERVAL_STEPS : 100
NO_OF_CHECKPOINTS : 5
FP16 : False
RESUME_FROM_CHECKPOINT : False
GRADIENT_ACCUMULATION_STEPS : 1
GRADIENT_CHECKPOINTING : True
predict_with_generate: True
generation_max_length : 128
REMOVE_UNUSED_COLUMNS : True

# FINE_TUNING_ARGS :
LAST_LAYER_TUNING : True
FULL_FINE_TUNING : False

PEFT_METHOD : 'LoRA'

# LoRA_CONFIG :
r : 16
alpha : 8
dropout : 0.1
peft_type : 'LoRA'
target_modules : 
fan_in_fan_out : False
init_lora_weights : True  

# BNB_CONFIG :
load_in_4bit : True
bnb_4bit_compute_dtype : "float16"
bnb_4bit_quant_type : "nf4"
bnb_4bit_use_double_quant : False

Adapting the model

With the yaml file configured, the adaptation process is initiated with the following command :

nyun run examples/adapt/summarization/config.yaml

Once the job starts, you will find the following directory structure in the user_data folder:

user_data/
├── jobs
│   └── Adapt
│       └── SUMM
├── logs
    └── Adapt
        └── SUMM
            └── log.log

The output model will be stored in user_data/jobs/Adapt/SUMM/ directory and the final directory structure will be:

user_data/
├── jobs
│   └── Adapt
│       └── SUMM
│           └── merged_model_state_dict.pth
├── logs
    └── Adapt
        └── SUMM
            └── log.log

Conclusion

This guide has walked you through the process of adapting the T5-Large model using QLoRA for summarization on the xsum dataset. By employing QLoRA, we efficiently fine-tuned the model with a reduced memory footprint.. The configuration and setup steps were outlined, ensuring that even complex tasks like distributed training and low-rank adaptation are manageable. The final trained model and logs are organized in a clear directory structure, making it easy to retrieve and analyze results.

Author: Panigrahi, Abhranta