You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear @craffel and T5 authors - thank you for all the hard work that went into the paper, codebase, and your amazing support of the community around it! 🙇 🙇 🙇
I was looking for pre-training configs of what is dubbed standard language model experiments from the paper (3.2.3 Objectives, architecture "Language model", objective "LM" from table 2, i.e. LM trained on task with targets-only), did I get it right that the configs in GCS bucket does not include these ATM? Asking, as could not find anything similar to the current objectives/lm.gin that would use @preprocessors.split_tokens_to_targets_length.
I was also a bit confused by the closest I could find - architectures/arch-lm_v1-prefix_lm, that has both run.model_type = 'lm' and @preprocessors.denoise (of objectives/prefix_lm.gin) at the same time. From what I understand, such setup would produce both inputs and targets and thus it was not clear how does an autoregressive single-stack Transformer is supposed to handle this situation (as there seems to be a guard against that in TF Mesh during training AND model type has to be delimited_lm for them to be concatenated)?
Would appreciate your kind pointers to the right direction of pre-training and evaluating an LM,
thanks in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear @craffel and T5 authors - thank you for all the hard work that went into the paper, codebase, and your amazing support of the community around it! 🙇 🙇 🙇
I was looking for pre-training configs of what is dubbed
standard language model
experiments from the paper (3.2.3 Objectives, architecture "Language model", objective "LM" from table 2, i.e. LM trained on task with targets-only), did I get it right that the configs in GCS bucket does not include these ATM? Asking, as could not find anything similar to the current objectives/lm.gin that would use@preprocessors.split_tokens_to_targets_length
.I was also a bit confused by the closest I could find - architectures/arch-lm_v1-prefix_lm, that has both
run.model_type = 'lm'
and@preprocessors.denoise
(ofobjectives/prefix_lm.gin
) at the same time. From what I understand, such setup would produce both inputs and targets and thus it was not clear how does an autoregressive single-stack Transformer is supposed to handle this situation (as there seems to be a guard against that in TF Mesh during training AND model type has to bedelimited_lm
for them to be concatenated)?Would appreciate your kind pointers to the right direction of pre-training and evaluating an LM,
thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions