Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More RRTMGP performance work #6879

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

More RRTMGP performance work #6879

wants to merge 11 commits into from

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Jan 8, 2025

Change list:

  1. Changes default RRMTGP backend to Kokkos
  2. Adds new testmods for selecting RRTMGP backend
  3. All kernels in rrtmgp interface can now be timed 4) Detranspose dimensions in kernels
  4. Use a faster approach for getting random cldx
  5. Update rrtmgp submodule

I've attached a screenshot of my custom kernel profiler:
Screenshot 2025-01-08 at 3 04 10 PM

Changepct is difference between YAKL and Kokkos versions of the kernel. Anything less than 100 means kokkos is faster and vice versa. Significant (>25%) speedups are highlighted green; the opposite are highlighted red. The overall time spent in run_impl is about 20% worse with Kokkos. I cannot yet account for this ~3-4 second loss of performance because the overall time spent in kernels is already less with Kokkos.

@jgfouca jgfouca added EAMxx PRs focused on capabilities for EAMxx BFB PR leaves answers BFB labels Jan 8, 2025
@@ -0,0 +1,3 @@
./xmlchange --append SCREAM_CMAKE_OPTIONS='SCREAM_RRTMGP_ENABLE_YAKL Off'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Kokkos is the default, do we need this testmod?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not unless you want to be double-sure that Kokkos is on :)

@jgfouca jgfouca force-pushed the jgfouca/more_rrtmgp_perf branch from a8779ef to 0cea725 Compare January 10, 2025 18:21
Change list:
1) Changes default RRMTGP backend to Kokkos
2) Adds new testmods for selecting RRTMGP backend
3) All kernels in rrtmgp interface can now be timed
4) Detranspose dimensions in kernels
5) Use a faster approach for getting random cldx
6) Update rrtmgp submodule
@jgfouca jgfouca force-pushed the jgfouca/more_rrtmgp_perf branch from 81a1e70 to 3751726 Compare January 13, 2025 21:58
@AaronDonahue
Copy link
Contributor

@jgfouca , is this ready for review?

@jgfouca
Copy link
Member Author

jgfouca commented Jan 24, 2025

@AaronDonahue , yes. The current data:

Screenshot 2025-01-24 at 11 36 24 AM

So, basically exact parity with YAKL. @ndkeen is checking a few things for me on pm.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I added two non-blocking comments (no need to address them)

@@ -527,6 +563,35 @@ static void rrtmgp_main(
extra_clnclrsky_diag, extra_clnsky_diag
);

pool_t::dealloc(sw_band2gpt_mem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little uneasy about having to do a lot of memory management that is distinctly different from other processes, but we can worry about it later

Comment on lines +1178 to +1186
// Kokkos::parallel_for(ncol, KOKKOS_LAMBDA(int icol) {
// conv::Random rand(seeds(icol));
// for (int igpt = 0; igpt < ngpt; igpt++) {
// for (int ilay = 0; ilay < nlay; ilay++) {
// cldx(icol,ilay,igpt) = rand.genFP<RealT>();
// }
// }
// });
TIMED_KERNEL(FLATTEN_MD_KERNEL3(ncol, nlay, ngpt, icol, ilay, igpt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're planning to move to the device-friendly generator, right? Or have you already done so? I can't really tell, but either way a comment would've been nice, but I assume you might plan to revisit this anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB EAMxx PRs focused on capabilities for EAMxx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants