-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Fix String.split()
implementations
#3528
base: nightly
Are you sure you want to change the base?
[stdlib] Fix String.split()
implementations
#3528
Conversation
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
String.split()
implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. I've just add a NIT-pick suggestion.
Also, is it possible to add a unit test?
Co-authored-by: Manuel Saelices <[email protected]> Signed-off-by: martinvuyk <[email protected]>
Hi, thanks for the review. Any type of test in mind that split tescases don't cover ? |
Signed-off-by: martinvuyk <[email protected]>
I thought this check would be broken in ❯ git diff
diff --git a/stdlib/test/collections/test_string.mojo b/stdlib/test/collections/test_string.mojo
index a664d321..b7a85c6c 100644
--- a/stdlib/test/collections/test_string.mojo
+++ b/stdlib/test/collections/test_string.mojo
@@ -824,6 +824,11 @@ def test_split():
assert_equal(res6[2], "долор")
assert_equal(res6[3], "сит")
assert_equal(res6[4], "амет")
+ var res7 = in6.split("м")
+ assert_equal(res7[0], "Лоре")
+ assert_equal(res7[1], " ипсу")
+ assert_equal(res7[2], " долор сит а")
+ assert_equal(res7[3], "ет") BTW, I still think it's a good test to add. |
I'm not understanding, so the lines from |
Signed-off-by: martinvuyk <[email protected]>
It's just a diff if you want to complete it with more test. LGTM anyways so don't worry. Thanks for that contribution 🥇 |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
…o into fix-split-implementations
@JoeLoser I removed all uses of generic origin functions and went straight for unsafe pointer and byte length for the internal split implementation. Would you mind testing this so that we can merge it before the next stable release? (if you haven't already branched from nightly at this point) |
Happy to see how this looks internally — do you mind rebasing to fix the conflicts so I can sync it? Otherwise the sync won't work. |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
@JoeLoser fixed all conflicts. @ConnorGray regarding commit 6acac62, |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Main issue
Fix
String.split()
implementations to use a generic implementation and without assuming that indexing is by byte offset. Added all methods toStringLiteral
andStringSlice
. Some important optimizations were added by parametrizing and avoiding slicing with numeric tricks.Changes in behavior
This PR changes
split("")
behavior to be non-raising and return the separated unicode characters analogous to when the whole string has the separator at start, end, and in between every character. Closes #3635String
,StringLiteral
, andStringSlice
.split()
now return aList[StringSlice]
.Benchmark results:
CPU: Intel® Core™ i7-7700HQ
improvement metric: markdown percentage improvement (
(old_value - new_value) / old_value
)Average improvement for split with a sequence: 91.2486% . In orders of magnitude, this is a 11x improvement
Average improvement for split on any whitespace: 99.9975% . In orders of magnitude, this is a 40k x improvement
bench_string_split[1000000]
bench_string_split_none[1000000]