Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regularization of Flux.params leads to increasing runtime and memory usage #1347

Closed
MariusDrulea opened this issue Dec 27, 2022 · 3 comments
Closed

Comments

@MariusDrulea
Copy link

I originally noticed the issue here, FluxML/model-zoo#383. It seems to be a Zygote issue, so I have created a dedicated issue here for a better traceability.

In the following MWE, loss_slow compiles at every iteration. Additionally, the runtime and the memory usage at each iteration are increasing. It looks like loss_slow causes Zygote to continuously accumulate some data. In the vae_mnist example, the runtime starts at 4 minutes/epoch and reaches 1.5 hours per epoch.

The equivalent loss_explicit function behaves as expected.

using Flux
using Flux: norm

model = Dense(2, 2)

loss_slow(m) = sum(p->norm(p), Flux.params(m))
loss_explicit(m) = norm(m.weight) + norm(m.bias)

for i in 1:10
    @time ∇m_slow = gradient(m->loss_slow(m), model)    
end

for i in 1:10
    @time ∇m_explicit = gradient(m->loss_explicit(m), model)    
end

Here is the output:

loss_slow:
 23.518778 seconds (62.17 M allocations: 3.153 GiB, 3.73% gc time, 99.94% compilation time)
  0.018303 seconds (4.03 k allocations: 183.281 KiB, 93.40% compilation time)
  0.018860 seconds (5.14 k allocations: 231.125 KiB, 93.63% compilation time)
  0.019585 seconds (6.24 k allocations: 281.562 KiB, 91.42% compilation time)
  0.019242 seconds (7.33 k allocations: 324.969 KiB, 92.79% compilation time)
  0.019103 seconds (8.44 k allocations: 376.188 KiB, 90.87% compilation time)
  0.019514 seconds (9.53 k allocations: 419.500 KiB, 91.37% compilation time)
  0.019786 seconds (10.63 k allocations: 467.250 KiB, 90.60% compilation time)
  0.022090 seconds (11.73 k allocations: 514.031 KiB, 91.70% compilation time)
  0.019207 seconds (12.83 k allocations: 561.297 KiB, 90.98% compilation time)
  0.038078 seconds (73.32 k allocations: 3.669 MiB, 99.70% compilation time)

loss_explicit:
  0.000017 seconds (29 allocations: 1.766 KiB)
  0.000015 seconds (29 allocations: 1.766 KiB)
  0.000006 seconds (29 allocations: 1.766 KiB)
  0.000005 seconds (29 allocations: 1.766 KiB)
  0.000005 seconds (29 allocations: 1.766 KiB)
  0.000006 seconds (29 allocations: 1.766 KiB)
  0.000006 seconds (29 allocations: 1.766 KiB)
  0.000004 seconds (29 allocations: 1.766 KiB)
  0.000004 seconds (29 allocations: 1.766 KiB)
@ToucheSir
Copy link
Member

If an issue can't be reproduced without Flux, it's probably a Flux issue ;). Do you have a Flux-free MWE? If not, then it'd be better to reopen FluxML/Flux.jl#2040 and add your comment to that thread.

@MariusDrulea
Copy link
Author

@ToucheSir I have moved my comment there. Please reopen #2040, as I don't have rights to do it.

@MariusDrulea
Copy link
Author

duplicate of FluxML/Flux.jl#2040

@MariusDrulea MariusDrulea closed this as not planned Won't fix, can't repro, duplicate, stale Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants