[stdlib] Refactor `normalize_index` adding parameters to centralize needed functionality #3783

martinvuyk · 2024-11-19T13:03:10Z

Refactor normalize_index adding parameters to centralize needed functionality

Closes #3644
Part of #2948

Changed the assert_mode to be "none" by default (I would like it to be "warn" but it doesn't seem to work at comp time) and made it a parameter for customization. Crashing on index out of bounds access is "safe" in the sense that it does not allow the access, but highly unsafe for code where you don't expect your program to crash. I have previously expressed this opinion on the PR where this function was created. Now that I've used it again I ended up just hand rolling my own implementation, which is pointless for the supposed purpose of this function: "standard and safe way to normalize indices"

Signed-off-by: martinvuyk <[email protected]>

…ze-index

owenhilyard · 2024-12-05T03:02:16Z

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

martinvuyk · 2024-12-05T12:17:21Z

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

@owenhilyard This is a tool, not the final implementation. Every collection will have to decide which version of this function to implement.

We could change the default to not be to clamp to container length and make it return -1 on overflow (or something along those lines) and the collection type downstream decides whether to return an optional or raise on -1. WDYT?

owenhilyard · 2024-12-06T23:30:31Z

Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort.

@owenhilyard This is a tool, not the final implementation. Every collection will have to decide which version of this function to implement.

We could change the default to not be to clamp to container length and make it return -1 on overflow (or something along those lines) and the collection type downstream decides whether to return an optional or raise on -1. WDYT?

@martinvuyk I went and did some benchmarks with a few variations of how to add safety here:

No clamp:
abort: 4.26 microseconds
raising with try/except: 7.97 microseconds
optional with unwrap: 6.03 microseconds

Clamp:
abort: 8.25 microseconds
raising with try/except: 10.11 microseconds
optional to implement clamping outside of normalize_index: 14.04 microseconds

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array. You can't use immovable values with most collections anyway, unless you have some wrapper type your return (which can be Move). Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number. If we can get a type which is signed and one bit larger than size_t, then we can use that, but we don't want collections to be bound to only part of an address space since having a collection that takes up more than half of your address space is a very real thing on some accelerators. We can work on compiler optimizations to return an Optional split into 2 registers, something like what I think raises already does, and I think that would be fine. If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

martinvuyk · 2024-12-07T14:33:13Z

No clamp: abort: 4.26 microseconds raising with try/except: 7.97 microseconds optional with unwrap: 6.03 microseconds

Clamp: abort: 8.25 microseconds raising with try/except: 10.11 microseconds optional to implement clamping outside of normalize_index: 14.04 microseconds

Hi @owenhilyard thanks for taking the time to do that. Could you show me the code? There may be some tiny aspects that end up impacting the whole thing, e.g. if you're generating random indexes, how big the amount of indexes you're testing are, how distributed around the true bounds of the collection (realistically the amount of misses will be say < 1 % of cases). I think on the side of branch prediction there might also be some perf improvements for optional on a bigger scale, not sure of how try/except is lowered to asm (can there also be an improvement there ?). I think we might also be able to improve instruction fetching (for several index accesses) using the likely intrinsic on the optional if check 🤔. I could setup some big benchmarks on our benchmark module here when I have more time this next week (or the one after that).

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array

Yeah I also like optionals more for index access APIs but we need to have many APIs which are like Python's, so we need all of the configurable options inside normalize_index (clamping is also part of some Python APIs but not others)

Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number.

True, but there are many APIs which Python inherited from C which we must also support. So we need to find a good perf normalize_index impl that is useful for all of them.

We can work on compiler optimizations to return an Optional split into 2 registers, something like what I think raises already does, and I think that would be fine

That would be great

If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

I think that won't hold for 32 bit systems

owenhilyard · 2024-12-07T21:44:38Z

No clamp: abort: 4.26 microseconds raising with try/except: 7.97 microseconds optional with unwrap: 6.03 microseconds
Clamp: abort: 8.25 microseconds raising with try/except: 10.11 microseconds optional to implement clamping outside of normalize_index: 14.04 microseconds

Hi @owenhilyard thanks for taking the time to do that. Could you show me the code? There may be some tiny aspects that end up impacting the whole thing, e.g. if you're generating random indexes, how big the amount of indexes you're testing are, how distributed around the true bounds of the collection (realistically the amount of misses will be say < 1 % of cases). I think on the side of branch prediction there might also be some perf improvements for optional on a bigger scale, not sure of how try/except is lowered to asm (can there also be an improvement there ?). I think we might also be able to improve instruction fetching (for several index accesses) using the likely intrinsic on the optional if check 🤔. I could setup some big benchmarks on our benchmark module here when I have more time this next week (or the one after that).

I treated it as throwaway code and wrote it in /tmp, so it's gone. This was a happy path test, since you should almost never encounter the unhappy path unless things have gone wrong, so all of the indexes I generated were valid indexes to dereference. Again, these are not overheads most people will notice, and the reason I lean towards Optional is because it forces considering what happens on an invalid index when you write the code. If you REALLY need the placement optimizations, you can use a raises version or the unsafe version.

To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array

Yeah I also like optionals more for index access APIs but we need to have many APIs which are like Python's, so we need all of the configurable options inside normalize_index (clamping is also part of some Python APIs but not others)

Python raises on error, but that is going to be very annoying to deal with, especially before we have checked errors. I lean towards Optional because you can't ignore it.

Magic return values have been shown over and over again to not be a good idea, especially since you lose half of your address space in order to return a negative number.

True, but there are many APIs which Python inherited from C which we must also support. So we need to find a good perf normalize_index impl that is useful for all of them.

The python stdlib is called through CPython, I don't think we have to re-invent C's mistakes just because Python did.

If we have to go the way of bounds checking centrally and returning Optional is not an option here, we can return UInt (since we already normalized away negatives) make the magic value SIZE_MAX, since it's fairly unlikely for someone to fill their entire address space with data from one collection.

I think that won't hold for 32 bit systems

An index of SIZE_MAX for a collection for an array of bytes would mean that not only does the code consume no memory, but also the collection itself is zero-sized. SIZE_MAX is the byte count of memory. The only thing that could run into this limit is if you had a smaller than byte type and stored it in a collection larger than 1/8 of memory, which requires a larger than register (on most systems) type to index. If you are doing this, you either are either using wide pointers or doing tricks with virtual memory/address spaces. As long as we assume that the size of the normalize_index function is non-zero, you will OOM if you try to create an array this large.

…ze-index

Signed-off-by: martinvuyk <[email protected]>

refactor normalize_index to cap to container length and other details

8ca410b

Signed-off-by: martinvuyk <[email protected]>

martinvuyk requested a review from a team as a code owner November 19, 2024 13:03

martinvuyk added 9 commits November 19, 2024 10:08

make assert_mode a parameter

a958314

Signed-off-by: martinvuyk <[email protected]>

change default assert_mode to none

5099768

Signed-off-by: martinvuyk <[email protected]>

try fix watever this cryptic assert error means

b2e24ac

Signed-off-by: martinvuyk <[email protected]>

try fix whatever this cryptic assert error means

672b579

Signed-off-by: martinvuyk <[email protected]>

deactivate test_out_of_bounds_message() test

94f1cc4

Signed-off-by: martinvuyk <[email protected]>

change default assert_mode to none

301ab45

Signed-off-by: martinvuyk <[email protected]>

try fixing checking the asserts

21e84c1

Signed-off-by: martinvuyk <[email protected]>

fix assert checks

e7521ba

Signed-off-by: martinvuyk <[email protected]>

fix container length bug

1070521

Signed-off-by: martinvuyk <[email protected]>

JoeLoser requested review from ConnorGray and jackos November 19, 2024 13:58

fix container length bug

7276ee1

Signed-off-by: martinvuyk <[email protected]>

JoeLoser assigned jackos Nov 21, 2024

Merge remote-tracking branch 'upstream/nightly' into refactor-normali…

7666950

…ze-index

martinvuyk mentioned this pull request Dec 11, 2024

[stdlib] Inline unnecessary helper functions for List #3857

Open

martinvuyk added 2 commits January 2, 2025 12:14

Merge remote-tracking branch 'upstream/nightly' into refactor-normali…

b8a8ea5

…ze-index

add branchless parameter and rename to clamp

c5ca2cf

Signed-off-by: martinvuyk <[email protected]>

martinvuyk changed the title ~~[stdlib] Refactor normalize_index to cap to container length and other details~~ [stdlib] Refactor normalize_index adding parameters to centralize needed functionality Jan 2, 2025

martinvuyk added 5 commits January 2, 2025 12:44

try fix filecheck

10e0757

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

de13973

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

bf617ff

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

2b0cf83

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

4634605

Signed-off-by: martinvuyk <[email protected]>

martinvuyk added 10 commits January 2, 2025 14:05

try fix filecheck

43424a6

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

6ff6a48

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

6645afe

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

5c7ccad

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

b906e2d

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

f46e2d2

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

cc3f26a

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

c86dc14

Signed-off-by: martinvuyk <[email protected]>

try fix filecheck

a0d013b

Signed-off-by: martinvuyk <[email protected]>

mojo format

c9799f2

Signed-off-by: martinvuyk <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Refactor `normalize_index` adding parameters to centralize needed functionality #3783

[stdlib] Refactor `normalize_index` adding parameters to centralize needed functionality #3783

martinvuyk commented Nov 19, 2024 •

edited

Loading

owenhilyard commented Dec 5, 2024

martinvuyk commented Dec 5, 2024

owenhilyard commented Dec 6, 2024

martinvuyk commented Dec 7, 2024

owenhilyard commented Dec 7, 2024

[stdlib] Refactor normalize_index adding parameters to centralize needed functionality #3783

Are you sure you want to change the base?

[stdlib] Refactor normalize_index adding parameters to centralize needed functionality #3783

Conversation

martinvuyk commented Nov 19, 2024 • edited Loading

owenhilyard commented Dec 5, 2024

martinvuyk commented Dec 5, 2024

owenhilyard commented Dec 6, 2024

martinvuyk commented Dec 7, 2024

owenhilyard commented Dec 7, 2024

[stdlib] Refactor `normalize_index` adding parameters to centralize needed functionality #3783

[stdlib] Refactor `normalize_index` adding parameters to centralize needed functionality #3783

martinvuyk commented Nov 19, 2024 •

edited

Loading