-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse backslash #363
Comments
I saw that there were many discussions revolving around moving ForwardDiff.jl's dual-number implementation into DualNumbers.jl and got a bit lost in those. What is the current status of this? Is there still an ongoing discussion? |
I guess a solution chould actually define a new factor type, which stores the factorization of the real part and the dual part of the matrix. So for dual-value matrix struct DualFactors
RealFactors
DualMatrix
end
function LinearAlgebra.factorize(M::AbstractArray{Dual{T}, 2})
return DualFactors(factorize(realpart.(M)), dualpart.(M))
end
function Base.:\(Mf::DualFactors, x::AbstractArray{T})
sol = Mf.RealFactors \ x
sol -= Dual(0.0, 1.0) * (MRf \ (Mf.DualMatrix * sol))
return sol
end Is that sensible? Is someone willing to help write this up to work with DualNumbers.jl or ForwardDiff.jl? |
Thank you for trying to enlighten me. However, I am still quite lost and overwhelmed with the links you provided. Could you comment on my understanding and my suggestion? Right now, I believe the way @define_diffrule Base.:\(x, y) = :( -($y / $x / $x) ), :( inv($x) ) Is this correct? I am not sure I understand exactly how the Now if what I just wrote is true, it sounds quite inefficient for treating function Base.:\(M::AbstractArray{Dual{T}, 2}, x::AbstractVecto{Dual{T}}) where T
A, B = factorize(realpart.(M)), dualpart.(M)
a, b = realpart.(x), dualpart.(x)
out = A \ a
out = out - ε * (A \ (B * out))
out = out + ε * (A \ b)
return out
end Can a rule be defined for non-scalars in order to take precedence and shortcut the whole thing? And even better as mentioned in my first post would be a rule using a new type of matrix factors, which stores the factors of only the real part of Is this already implemented in Julia? Would it not be much faster than the current "brute-force" approach of applying the diff rule to all scalars throughout? It would also allow ForwardDiff to "percolate" through those non-julia-native parts of factorizing and I don't know if I am out of my depth here, but I would love to understand this and/or help if I can. |
Sorry for being terse. It just seems like this issue keeps cropping up.
DiffRules.jl only defines differentiation rules for scalar-valued functions. Note that
Julia can be about as efficient or as inefficient as you want 🙂. The direct cause of the stack overflow is that https://github.com/JuliaLang/julia/blob/v1.0.1/stdlib/SuiteSparse/src/umfpack.jl#L174 calls
Yes, although I don't think there's currently a precedent for that in ForwardDiff.jl itself. By the way, I'm mostly just an onlooker here, @jrevels is the main maintainer and can answer your questions better. |
I'm sorry to insist, but I still think this could improve ForwardDiff.jl's efficiency when dealing with using LinearAlgebra, DualNumbers, BenchmarkTools, SparseArrays
n = 2000 # size of J and y
# make J and y
J = Matrix(sprand(n, n, 1/n) + I) + Matrix(sprand(n, n, 1/n) + I) * ε
y = rand(n) + rand(n) * ε
# Type that stores the required terms only
mutable struct DualJacobianFactors
Af::LinearAlgebra.LU{Float64,Array{Float64,2}} # the factors of the real part
B::Array{Float64,2} # the non-real part
end
# Constructor function for a dual-valued matrix M
fast_factorize(M::Array{Dual{Float64},2}) = DualJacobianFactors(factorize(realpart.(M)), dualpart.(M))
# overload backslash (maybe this can be improved on)
function Base.:\(J::DualJacobianFactors, y::Vector{Dual{Float64}})
a, b = realpart.(y), dualpart.(y)
Af, B = J.Af, J.B
out = zeros(Dual{Float64}, length(y))
out .= Af \ a
out .-= (Af \ (B * out)) * ε
out .+= (Af \ b) * ε
return out
end
# Benchmark the factorization
@benchmark (Jf = factorize(J) ;)
@benchmark (fast_Jf = fast_factorize(J) ;)
# Benchmark the back-substitution
Jf = factorize(J) ;
fast_Jf = fast_factorize(J) ;
@benchmark (x1 = Jf \ y ;)
@benchmark (x2 = fast_Jf \ y ;)
# Check that the results are approximately the same
x1 = Jf \ y ;
x2 = fast_Jf \ y ;
realpart.(x1) ≈ realpart.(x2)
dualpart.(x1) ≈ dualpart.(x2) I am not sure this is the right way to benchmark these things, but you can play around with the size julia> @benchmark (Jf = factorize(J) ;)
BenchmarkTools.Trial:
memory estimate: 61.05 MiB
allocs estimate: 16
--------------
minimum time: 10.883 s (0.01% GC)
median time: 10.883 s (0.01% GC)
mean time: 10.883 s (0.01% GC)
maximum time: 10.883 s (0.01% GC)
--------------
samples: 1
evals/sample: 1
julia> @benchmark (fast_Jf = fast_factorize(J) ;)
BenchmarkTools.Trial:
memory estimate: 91.57 MiB
allocs estimate: 9
--------------
minimum time: 432.454 ms (2.36% GC)
median time: 477.843 ms (2.32% GC)
mean time: 487.862 ms (6.14% GC)
maximum time: 592.935 ms (1.87% GC)
--------------
samples: 11
evals/sample: 1 |
@briochemc Hey thanks for your solution, it helps me a lot. I patched up ForwardDiff for this functionality, as below:
I'm also not very familiar with the Dual API in ForwardDiff so this implementation is likely to be inefficient, but it's still better than the dense factorization alternatives. As far as I understand, ForwardDiff mainly works at the scalar level, so this kind of patch is not really desired. ForwardDiff2 is the package for vector-level implementation, and with ChainRules.jl it's much easier to add customized rules, but they are still under heavy development. |
Yes I try to follow what's happening with ForwardDiff2.jl :) FYI I also made DualMatrixTools.jl and HyperDualMatrixTools.jl too a year or so back, if that's any interest to you. |
I have a use case for this where I solve a sparse linear system a bunch of times and need to be careful about allocating memory. In case helpful to anyone, I've implemented the code from @FuZhiyu for |
See also Sparspak.jl for a take on this... |
Thanks, @j-fu! |
I get a
StackOverflowError
if I try to differentiate some code with a sparse matrix dependent on the variable and I use backslash with it. MWE:Compared to the "full" version, which works:
I feel like this could be solved by adding a rule for underlying machinery of dual numbers for backslash, something akin to
(That's what I do when I use DualNumbers.jl.) This should work and allow to only use LinearAlgebra with real-valued types because
But I am not sure how to suggest changes directly in ForwardDiff.jl. Does that make sense?
The text was updated successfully, but these errors were encountered: