Large variance in convergence between T5 and T5.1 #959
Unanswered
pablogranolabar
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hola!
I am fine tune training a T5 model for multilabel text classification and recently integrated the T5.1 checkpoints in the training pipeline. But the initial loss numbers and convergence rates are night and day, T5 starts to converge after a single epoch but T5.1 has high loss values and takes quite a bit longer, if at all. My model params are:
train_batch_size 2
eval_batch_size 16
fp16 false
learning rate 3e-4 (and 1e-4)
max_seq_length 512
max_source_length 512
max_target_length 96
model_class: str = "T5Model"
dataset_class: Dataset = None
do_sample: bool = False
early_stopping: bool = True
evaluate_generated_text: bool = False
length_penalty: float = 2.0
max_length: int = 20
max_steps: int = -1
num_beams: int = 1
num_return_sequences: int = 1
preprocess_inputs: bool = True
repetition_penalty: float = 1.0
scheduler: str = "constant_schedule_with_warmup"
adafactor_relative_step: bool = False
adafactor_scale_parameter: bool = False
adafactor_warmup_init: bool = False
learning_rate: float = 1e-3
optimizer: str = "Adafactor"
special_tokens_list: list = field(default_factory=list)
top_k: float = None
top_p: float = None
use_multiprocessed_decoding: bool = True
Any thoughts would be greatly appreciated
TIA
Beta Was this translation helpful? Give feedback.
All reactions