-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various errors working with DTables.jl #438
Comments
@StevenWhitaker can you try reproducing these again on Dagger |
Thanks for getting a patch released! The issues are different now, so that's something ;) Now I observe the following behavior (EDIT: when running Julia with multiple threads):
I realized that I start Julia with multiple threads by default, so I also ran the code with a single thread ( So, besides the one sporadic error, this issue seems to be addressed, assuming the issues I observed with multiple threads are due to the interplay between |
Edit to my previous comment: I'm running my actual code with a single thread now, and it also hangs, so there might be something else still at play. |
I can reproduce the hangs - I'll keep investigating! Thanks for your patience 🙂 |
Running through your example with Dagger's logging enabled, I find that we spend a good bit of time (about 0.3-0.5 s for me) in the A large portion of the time is spent in the GC (about 40% time over ~80K allocations totaling ~500MB), so I suspect allocations are what's killing performance. If I can figure out how to reduce those allocations, it would also be reasonable to parallelize the Additionally, the other calls that took a while are EDIT: Those timings and allocations are so high because of logging - they drop significantly when logging is disabled, although then I see a ton of long-lived allocations that threaten to crash Julia. I still need to see if some of those allocations can be reduced. EDIT 2: Silly me, these reductions are already asynchronous 😄 I guess the task completes before we return from |
Ok, something that I would recommend is, instead of the Can you test that and confirm whether it speeds your script up sufficiently for it to complete in a reasonable amount of time? |
Thanks for the tip. I tried it out on my actual project (not the exact example in the OP), and it does seem to help, but I still see the code hang occasionally. I'm pretty sure it's not just taking forever, because when the code does complete, it doesn't take that long, and when it hangs the cpu utilization drops to 0. It actually seems to be the case that my code hangs only when calling my main function again after a successful run. Or at least the chances of hanging are higher in that case. I'm not really sure why that would be the case, though. I also saw a new error (when calling
It looks like it has to do with file loading, so this is the code I use to load .csv files: DTable(x -> CSV.File(x), [filepath]; tabletype = DataFrame) I only saw the error once, though. And another one-time error (in the function with the
The above errors occurred when calling my main function the first time. |
I tried to create a MWE that was closer to the actual workflow I'm working with. I'm guessing the errors occurring here are related to #437 (one of the four reported errors below is the same as the linked issue). I hope this is helpful and not just extra noise!
Contents of
mwe.jl
:I
include
dmwe.jl
in a fresh Julia session multiple times (meaning eachinclude
occurred in its own fresh Julia session) and recorded the following errors. Note that nothing changed inmwe.jl
from run to run.Error 1:
Error 2:
Error 3:
Error 3b: Occasionally the segfault was preceded by one or more occurrences of:
Error 4:
Comments:
MethodError
withconvert
(error 1). I most commonly run into the error mentioned in"Multiple concurrent writes to Dict detected!"
withDTables.reduce
#437 (comment), which I did not see withmwe.jl
."file.csv"
is a 157 MB table with 233930 rows and 102 columns ofString
andFloat64
values.remotecall
probably isn't necessary for reproducing the bugs, but I included it because that is how my actual work is.The text was updated successfully, but these errors were encountered: