-
Notifications
You must be signed in to change notification settings - Fork 2
Some initial profiling output #2
Comments
You may want to try using Intel VTune profiler. |
I looked into that but seems like it's not free, so I'm stuck with open source tools. |
A quick Google search gave me this page: https://stackoverflow.com/a/7190210. |
@hcho3 @thvasilo Intel VTune is free only if you are a student, educator, researcher, or a contributor to significant open source projects in computing. xgboost probably falls in the last section which means you should have access to Intel Parallel Studio Professional for Linux for free for 12 months (non Professional for Windows). |
Thanks @Laurae2 I had originally thought VTune is not free even for students. This should help! |
@thvasilo You have to make sure you are downloading the Professional (or Cluster) version of Intel Parallel Studio. The Composer edition (the default one) is provided without Intel VTune. I get Intel Parallel Studio Cluster Edition from the Educator package, students should get the same version. Open source contributors get Professional instead of Cluster, when using Linux. In Windows, it is Composer. Intel does not make this obvious, it could be Cluster edition for everyone eligible... |
@thvasilo Small update: Intel Parallel Studio Cluster Edition is free for students for 12 months. You have all Intel software inside, including Intel VTune (currently, it's at 2018 Update 4 although there is a 2019 Initial Release). If you use a recent OS with a recent kernel, make sure to use a newer hardware sampling driver otherwise Intel VTune will complain about an error: https://software.intel.com/sites/default/files/managed/ac/c1/sepdk_v5_575421.tar.gz Ubuntu users must also install libelf: |
@Laurae2 thanks that's the one I ended up with. I'll try it out this week. |
@thvasilo There is a bug with Intel web interface which prohibits the direct download of Intel VTune 2019, which is required for the most recent kernels / OS (it has updated sampling drivers). You can download Intel Parallel Studio XE 2019 Cluster Edition for Linux, and use "Customize" from the GUI (
|
@thvasilo Did you manage to find interesting stuff with Intel software? (they are not too hard to understand if you are used to perf / valgrind, especially because the Intel software got a GUI to do everything) If you use Intel compilers which you will require for profiling with maximum data gathering, on a new terminal: I use the following flags to compile cmake with profiling: Intel VTune Amplifier, on a new terminal: Intel Advisor, on a new terminal: Intel Inspector, on a new terminal: |
Hello @Laurae2, I've been working on some Phd stuff lately so I haven't had time to look into this. Thanks for all the advice though it will come in very handy when I get around to it. |
I did some basic profiling with cachegrind, running for 6 threads like so:
valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes ./perflab record/ 6
I've attached the call graph visualization from kCacheGrind and the file for anyone who might want to explore on their own. Since a lot of work is done through openMP it might be hard to see what's happening clearly without recompiling that to add debug information.
Note that due to the overhead introduced by valgrind the data loading numbers may be exaggerated. In this case it seems to take ~50% of the the time which is definitely not representative of a real run.
The output was
which shows the level of overhead, data loading takes ~18 seconds and gradient computation ~11 seconds normally.
callgrind.out.txt
@hcho3 Do you have an tips for profiling multi-threaded applications?
The text was updated successfully, but these errors were encountered: