Exemplar MAE with DDP does not work #1775

mcleod-matthew-gene · 2025-01-07T22:56:00Z

Hello all,

Thanks for the great open source package. I noticed the example for MAE with pytorch lightning training DDP simply does not work? There is an to be an issue with unused parameters. i.e.

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.
    if torch.is_grad_enabled() and self.reducer._rebuild_buckets()

If you print which parameters are do not have a gradient, you'll see they are vit.pos_embed, vit.head.weight, vit.head.bias. The unused head parameter makes sense, but I don't see why the vit.pos_embed would be unused.

I'd really appreciate if could confirm this is an issue with the example on main and if fixing this will be on the roadmap.

Thanks!

The text was updated successfully, but these errors were encountered:

guarin · 2025-01-08T07:51:12Z

Hi, thanks for raising the issue! This is indeed wrong in the example. You have to set strategy="ddp_find_unused_parameters_true" for it to work.

If for some reason you cannot use ddp_find_unused_parameters_true and have to use ddp you can also drop the unused classifier weights from the backbone. I believe this is possible with:

class MAE(pl.LightningModule):
    def __init__(self):
        super().__init__()

        decoder_dim = 512
        vit = vit_base_patch32_224()
        vit.reset_classifier(0, '')
        ...

Finally, if you want to reproduce results from the paper I suggest you follow the more complete implementation here:

Regarding the positional embedding, MAE uses a fixed 2D sin-cos positional embedding and the corresponding parameter has set requires_grad=False. See:

https://github.com/lightly-ai/lightly/blob/f6ad8c784d5276bc6b29f6b89c087e9fb1908698/lightly/models/modules/masked_vision_transformer_timm.py#L59C15-L63

lightly/lightly/models/utils.py

Lines 1087 to 1101 in f6ad8c7

    
           def initialize_2d_sine_cosine_positional_embedding( 
        
               pos_embedding: Parameter, has_class_token: bool 
        
           ) -> None: 
        
               _, seq_length, hidden_dim = pos_embedding.shape 
        
               grid_size = int((seq_length - int(has_class_token)) ** 0.5) 
        
               sine_cosine_embedding = get_2d_sine_cosine_positional_embedding( 
        
                   embed_dim=hidden_dim, 
        
                   grid_size=grid_size, 
        
                   cls_token=has_class_token, 
        
               ) 
        
               pos_embedding.data.copy_( 
        
                   torch.from_numpy(sine_cosine_embedding).float().unsqueeze(0) 
        
               ) 
        
               # Freeze positional embedding. 
        
               pos_embedding.requires_grad = False

If I remember correctly DDP expects that all parameters receive an update even if requires_grad=False (might be wrong there though). We have an issue regarding this here #1434

liopeer · 2025-01-08T13:23:26Z

@mcleod-matthew-gene In the paper you will see that they also use sinusoidal positonal embeddings Masked Autoencoders Are Scalable Vision Learners – Appendix A.1, ViT architecture

Our MAE adds positional embeddings [57] (the sine-cosine version) to both the encoder and decoder inputs.

Therefore I would also suggest to proceed in suggested way above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exemplar MAE with DDP does not work #1775

Exemplar MAE with DDP does not work #1775

mcleod-matthew-gene commented Jan 7, 2025

guarin commented Jan 8, 2025 •

edited

Loading

liopeer commented Jan 8, 2025

Exemplar MAE with DDP does not work #1775

Exemplar MAE with DDP does not work #1775

Comments

mcleod-matthew-gene commented Jan 7, 2025

guarin commented Jan 8, 2025 • edited Loading

liopeer commented Jan 8, 2025

guarin commented Jan 8, 2025 •

edited

Loading