You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if there is mathematical justification for this? Why take the mean of the individual weight KL divergences only to later sum across layers?
The text was updated successfully, but these errors were encountered:
There is a mean taken inside BaseVariationalLayer_.kl_div(). But later a sum is used inside get_kl_loss() & when reducing the KL loss of a layer's bias & weights (e.g. inside Conv2dReparameterization.kl_loss()).
I'm wondering if there is mathematical justification for this? Why take the mean of the individual weight KL divergences only to later sum across layers?
The text was updated successfully, but these errors were encountered: