Autoregressive LM experiments #908

bzz · 2021-09-25T21:43:33Z

bzz
Sep 25, 2021

Dear @craffel and T5 authors - thank you for all the hard work that went into the paper, codebase, and your amazing support of the community around it! 🙇 🙇 🙇

I was looking for pre-training configs of what is dubbed standard language model experiments from the paper (3.2.3 Objectives, architecture "Language model", objective "LM" from table 2, i.e. LM trained on task with targets-only), did I get it right that the configs in GCS bucket does not include these ATM? Asking, as could not find anything similar to the current objectives/lm.gin that would use @preprocessors.split_tokens_to_targets_length.

I was also a bit confused by the closest I could find - architectures/arch-lm_v1-prefix_lm, that has both run.model_type = 'lm' and @preprocessors.denoise (of objectives/prefix_lm.gin) at the same time. From what I understand, such setup would produce both inputs and targets and thus it was not clear how does an autoregressive single-stack Transformer is supposed to handle this situation (as there seems to be a guard against that in TF Mesh during training AND model type has to be delimited_lm for them to be concatenated)?

Would appreciate your kind pointers to the right direction of pre-training and evaluating an LM,
thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoregressive LM experiments #908

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Autoregressive LM experiments #908

bzz Sep 25, 2021

Replies: 0 comments

bzz
Sep 25, 2021