Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Refactor normalize_index adding parameters to centralize needed functionality #3783

Open
wants to merge 29 commits into
base: nightly
Choose a base branch
from

Conversation

martinvuyk
Copy link
Contributor

@martinvuyk martinvuyk commented Nov 19, 2024

Refactor normalize_index adding parameters to centralize needed functionality

Closes #3644
Part of #2948

Changed the assert_mode to be "none" by default (I would like it to be "warn" but it doesn't seem to work at comp time) and made it a parameter for customization. Crashing on index out of bounds access is "safe" in the sense that it does not allow the access, but highly unsafe for code where you don't expect your program to crash. I have previously expressed this opinion on the PR where this function was created. Now that I've used it again I ended up just hand rolling my own implementation, which is pointless for the supposed purpose of this function: "standard and safe way to normalize indices"

@martinvuyk martinvuyk requested a review from a team as a code owner November 19, 2024 13:03
Signed-off-by: martinvuyk <[email protected]>
@owenhilyard
Copy link
Contributor

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

@martinvuyk
Copy link
Contributor Author

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

@owenhilyard This is a tool, not the final implementation. Every collection will have to decide which version of this function to implement.

We could change the default to not be to clamp to container length and make it return -1 on overflow (or something along those lines) and the collection type downstream decides whether to return an optional or raise on -1. WDYT?

@owenhilyard
Copy link
Contributor

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

@owenhilyard This is a tool, not the final implementation. Every collection will have to decide which version of this function to implement.

We could change the default to not be to clamp to container length and make it return -1 on overflow (or something along those lines) and the collection type downstream decides whether to return an optional or raise on -1. WDYT?

@martinvuyk I went and did some benchmarks with a few variations of how to add safety here:

No clamp:
abort: 4.26 microseconds
raising with try/except: 7.97 microseconds
optional with unwrap: 6.03 microseconds

Clamp:
abort: 8.25 microseconds
raising with try/except: 10.11 microseconds
optional to implement clamping outside of normalize_index: 14.04 microseconds

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array. You can't use immovable values with most collections anyway, unless you have some wrapper type your return (which can be Move). Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number. If we can get a type which is signed and one bit larger than size_t, then we can use that, but we don't want collections to be bound to only part of an address space since having a collection that takes up more than half of your address space is a very real thing on some accelerators. We can work on compiler optimizations to return an Optional split into 2 registers, something like what I think raises already does, and I think that would be fine. If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

@martinvuyk
Copy link
Contributor Author

No clamp: abort: 4.26 microseconds raising with try/except: 7.97 microseconds optional with unwrap: 6.03 microseconds

Clamp: abort: 8.25 microseconds raising with try/except: 10.11 microseconds optional to implement clamping outside of normalize_index: 14.04 microseconds

Hi @owenhilyard thanks for taking the time to do that. Could you show me the code? There may be some tiny aspects that end up impacting the whole thing, e.g. if you're generating random indexes, how big the amount of indexes you're testing are, how distributed around the true bounds of the collection (realistically the amount of misses will be say < 1 % of cases). I think on the side of branch prediction there might also be some perf improvements for optional on a bigger scale, not sure of how try/except is lowered to asm (can there also be an improvement there ?). I think we might also be able to improve instruction fetching (for several index accesses) using the likely intrinsic on the optional if check 🤔. I could setup some big benchmarks on our benchmark module here when I have more time this next week (or the one after that).

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array

Yeah I also like optionals more for index access APIs but we need to have many APIs which are like Python's, so we need all of the configurable options inside normalize_index (clamping is also part of some Python APIs but not others)

Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number.

True, but there are many APIs which Python inherited from C which we must also support. So we need to find a good perf normalize_index impl that is useful for all of them.

We can work on compiler optimizations to return an Optional split into 2 registers, something like what I think raises already does, and I think that would be fine

That would be great

If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

I think that won't hold for 32 bit systems

@owenhilyard
Copy link
Contributor

No clamp: abort: 4.26 microseconds raising with try/except: 7.97 microseconds optional with unwrap: 6.03 microseconds
Clamp: abort: 8.25 microseconds raising with try/except: 10.11 microseconds optional to implement clamping outside of normalize_index: 14.04 microseconds

Hi @owenhilyard thanks for taking the time to do that. Could you show me the code? There may be some tiny aspects that end up impacting the whole thing, e.g. if you're generating random indexes, how big the amount of indexes you're testing are, how distributed around the true bounds of the collection (realistically the amount of misses will be say < 1 % of cases). I think on the side of branch prediction there might also be some perf improvements for optional on a bigger scale, not sure of how try/except is lowered to asm (can there also be an improvement there ?). I think we might also be able to improve instruction fetching (for several index accesses) using the likely intrinsic on the optional if check 🤔. I could setup some big benchmarks on our benchmark module here when I have more time this next week (or the one after that).

I treated it as throwaway code and wrote it in /tmp, so it's gone. This was a happy path test, since you should almost never encounter the unhappy path unless things have gone wrong, so all of the indexes I generated were valid indexes to dereference. Again, these are not overheads most people will notice, and the reason I lean towards Optional is because it forces considering what happens on an invalid index when you write the code. If you REALLY need the placement optimizations, you can use a raises version or the unsafe version.

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array

Yeah I also like optionals more for index access APIs but we need to have many APIs which are like Python's, so we need all of the configurable options inside normalize_index (clamping is also part of some Python APIs but not others)

Python raises on error, but that is going to be very annoying to deal with, especially before we have checked errors. I lean towards Optional because you can't ignore it.

Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number.

True, but there are many APIs which Python inherited from C which we must also support. So we need to find a good perf normalize_index impl that is useful for all of them.

The python stdlib is called through CPython, I don't think we have to re-invent C's mistakes just because Python did.

If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

I think that won't hold for 32 bit systems

An index of SIZE_MAX for a collection for an array of bytes would mean that not only does the code consume no memory, but also the collection itself is zero-sized. SIZE_MAX is the byte count of memory. The only thing that could run into this limit is if you had a smaller than byte type and stored it in a collection larger than 1/8 of memory, which requires a larger than register (on most systems) type to index. If you are doing this, you either are either using wide pointers or doing tricks with virtual memory/address spaces. As long as we assume that the size of the normalize_index function is non-zero, you will OOM if you try to create an array this large.

@martinvuyk martinvuyk changed the title [stdlib] Refactor normalize_index to cap to container length and other details [stdlib] Refactor normalize_index adding parameters to centralize needed functionality Jan 2, 2025
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants