Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Implementation of Histogram APIs #1974

Merged
merged 66 commits into from
Jan 23, 2025

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Dec 18, 2024

Implementation of histogram APIs for host backends.

Implementations are provided for serial, tbb, and openMP backends. We add a generic __thread_enumerable_storage struct to add a generic thread local storage for our host backends. We use the new TLS (Thread local storage) with parallel_for to implement histogram. Testing is also added, and some minor adjustments are made to cmake.

Please see the RFC documentation / discussion here for more details.

@danhoeflinger danhoeflinger marked this pull request as ready for review December 30, 2024 20:59
@danhoeflinger danhoeflinger changed the title [Draft] Host Implementation of Histogram APIs Host Implementation of Histogram APIs Dec 30, 2024
@danhoeflinger danhoeflinger added this to the 2022.8.0 milestone Dec 30, 2024
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
Copy link
Contributor

@adamfidel adamfidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me. I had to go through it a few times to understand __construct_by_args, but I think I get how it works now and why it's useful.

Just some minor comments from my side.

@@ -4289,6 +4289,86 @@ __pattern_shift_right(_Tag __tag, _ExecutionPolicy&& __exec, _BidirectionalItera
return __res.base();
}

template <typename _ForwardIterator, typename _IdxHashFunc, typename _RandomAccessIterator, class _IsVector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why _IsVector is a class when the rest are typenames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems the convention in this file is generally for everything to be class (though its not fully consistent). I'll adjust it to the norm.

std::uint32_t __count = 0;
std::uint32_t __j = 0;

for (; __j < __thread_specific_storage.size() && __count <= __i; ++__j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::uint32_t __count = 0 could probably be moved here to the initialization expression of this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
for (; __first != __last; ++__first)
{
std::int32_t __bin = __func.get_bin(*__first);
if (__bin >= 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can get_bin return negative value?
  2. If yes, what is correct behavior for __brick_histogram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, -1 is returned when the input element does not fall within any histogram bin. The correct behavior is to do nothing and skip this input element.
Specification
"Input values that do not map to a defined bin are skipped silently."

I recently looked into expanding the bin helper interface to include a separate function to check bounds, and another to get the bin which assumes it is in bounds. I thought this might provide benefit by reducing the number of branches by 1, but I saw no performance benefit from this change for CPU or GPU. It is still something we could pursue in the future.

mmichel11
mmichel11 previously approved these changes Jan 16, 2025
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation LGTM, but we should probably hold off on merging until the RFC is first merged.

I agree with ignoring clang-format for the single difference.

CMakeLists.txt Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
Signed-off-by: Dan Hoeflinger <[email protected]>
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
Comment on lines +158 to +163
namespace __detail
{

template <typename _ValueType, typename... _Args>
struct __enumerable_thread_local_storage
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though these class templates are now internal, the API description for them is still a part of the backend API - you should know what can be done with the object obtained from the make function. We will therefore need to document it somewhere in sufficient detail - perhaps in the RFC for a start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe this interface is described in the RFC to some extent already. I can make that more specific and explicit.

include/oneapi/dpl/pstl/histogram_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/histogram_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/histogram_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/histogram_impl.h Outdated Show resolved Hide resolved
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger
<[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked through the changes since my last approval, and it LGTM as well.

@danhoeflinger danhoeflinger merged commit 780705c into main Jan 23, 2025
22 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoefling/histogram_CPU_impl branch January 23, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants