Large variance in convergence between T5 and T5.1 #959

pablogranolabar · 2022-01-08T21:24:06Z

pablogranolabar
Jan 8, 2022

Hola!

I am fine tune training a T5 model for multilabel text classification and recently integrated the T5.1 checkpoints in the training pipeline. But the initial loss numbers and convergence rates are night and day, T5 starts to converge after a single epoch but T5.1 has high loss values and takes quite a bit longer, if at all. My model params are:

train_batch_size 2
eval_batch_size 16
fp16 false
learning rate 3e-4 (and 1e-4)
max_seq_length 512
max_source_length 512
max_target_length 96
model_class: str = "T5Model"
dataset_class: Dataset = None
do_sample: bool = False
early_stopping: bool = True
evaluate_generated_text: bool = False
length_penalty: float = 2.0
max_length: int = 20
max_steps: int = -1
num_beams: int = 1
num_return_sequences: int = 1
preprocess_inputs: bool = True
repetition_penalty: float = 1.0
scheduler: str = "constant_schedule_with_warmup"
adafactor_relative_step: bool = False
adafactor_scale_parameter: bool = False
adafactor_warmup_init: bool = False
learning_rate: float = 1e-3
optimizer: str = "Adafactor"
special_tokens_list: list = field(default_factory=list)
top_k: float = None
top_p: float = None
use_multiprocessed_decoding: bool = True

Any thoughts would be greatly appreciated

TIA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large variance in convergence between T5 and T5.1 #959

{{title}}

Replies: 0 comments

Select a reply

Large variance in convergence between T5 and T5.1 #959

pablogranolabar Jan 8, 2022

Replies: 0 comments

pablogranolabar
Jan 8, 2022