-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More RRTMGP performance work #6879
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,3 @@ | |||
./xmlchange --append SCREAM_CMAKE_OPTIONS='SCREAM_RRTMGP_ENABLE_YAKL Off' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Kokkos is the default, do we need this testmod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not unless you want to be double-sure that Kokkos is on :)
a8779ef
to
0cea725
Compare
Change list: 1) Changes default RRMTGP backend to Kokkos 2) Adds new testmods for selecting RRTMGP backend 3) All kernels in rrtmgp interface can now be timed 4) Detranspose dimensions in kernels 5) Use a faster approach for getting random cldx 6) Update rrtmgp submodule
81a1e70
to
3751726
Compare
@jgfouca , is this ready for review? |
@AaronDonahue , yes. The current data: So, basically exact parity with YAKL. @ndkeen is checking a few things for me on pm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I added two non-blocking comments (no need to address them)
@@ -527,6 +563,35 @@ static void rrtmgp_main( | |||
extra_clnclrsky_diag, extra_clnsky_diag | |||
); | |||
|
|||
pool_t::dealloc(sw_band2gpt_mem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little uneasy about having to do a lot of memory management that is distinctly different from other processes, but we can worry about it later
// Kokkos::parallel_for(ncol, KOKKOS_LAMBDA(int icol) { | ||
// conv::Random rand(seeds(icol)); | ||
// for (int igpt = 0; igpt < ngpt; igpt++) { | ||
// for (int ilay = 0; ilay < nlay; ilay++) { | ||
// cldx(icol,ilay,igpt) = rand.genFP<RealT>(); | ||
// } | ||
// } | ||
// }); | ||
TIMED_KERNEL(FLATTEN_MD_KERNEL3(ncol, nlay, ngpt, icol, ilay, igpt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're planning to move to the device-friendly generator, right? Or have you already done so? I can't really tell, but either way a comment would've been nice, but I assume you might plan to revisit this anyway?
Change list:
I've attached a screenshot of my custom kernel profiler:
Changepct is difference between YAKL and Kokkos versions of the kernel. Anything less than 100 means kokkos is faster and vice versa. Significant (>25%) speedups are highlighted green; the opposite are highlighted red. The overall time spent in run_impl is about 20% worse with Kokkos. I cannot yet account for this ~3-4 second loss of performance because the overall time spent in kernels is already less with Kokkos.