-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update String#trim #354
base: master
Are you sure you want to change the base?
update String#trim #354
Conversation
Thanks for the contribution! You're correct about the style and the proper time to update minified files. Can you also provide a test case that would fail without this change? Also, I'd prefer to separate the bug fix PR from the performance PR, if possible. Please also rebase out typo fix commits and force push :-) |
Can you also link to the ES6 spec and show where these characters should be included? |
The spec is here: http://ecma-international.org/ecma-262/6.0/#sec-white-space Throughout the spec, The algorithm for |
So are the code points we're missing ones that were added in later Unicode versions? I'm still not clear on the problem here, but I do want to conform :-) |
Looks like this patch doesn’t add any code points to the whitespace set. It only checks The patch description mentions these are commonly erroneously trimmed. I doubt the spec has info on that, @ljharb. Perhaps @lewisje can tell us which engines incorrectly trim |
I don't have information on any engines that commonly erroneously trim either of those characters: I just know that this shim checks Whitespace is defined in section 7.2 of the ES5 spec: http://es5.github.io/#x7.2 It includes Both U+180E MONGOLIAN VOWEL SEPARATOR and U+200B ZERO WIDTH SPACE were re-classified from Zs (Space_Separator) to Cf (Format), while U+0085 NEXT LINE (NEL) is in Cc (Control); U+180E was moved in Unicode 6.3, U+200B was moved in Unicode 4.0.1, and U+0085 has been in Cc at least since Unicode 1.1.5. Line terminators are defined in 7.3 as The spec says, in 15.5.4.20, that String#trim must remove "both leading and trailing white space" where "white space" means both whitespace and line terminators: http://es5.github.io/#x15.5.4.20 The ES6 spec has the same definitions for "whitespace" and "line terminator" as the ES5 spec, in Tables 32 and 33, respectively. The ES5 spec also says that Unicode 3.0 or later is followed, which means that U+180E and U+200B may be considered non-trimmable, and also that U+0085, despite appearing in Unicode's "White_Space" list, must be considered non-trimmable, because it is not in the explicit list of whitespace or line terminators and is not in Zs: http://es5.github.io/#x2 However, the ES6 spec says that Unicode 5.1.0 or later is followed, which means that U+0085 and U+200B must be considered non-trimmable, while U+180E may be considered non-trimmable. This means that the test for improperly trimmed characters in this shim must include both |
Sounds good - we still need an automated test to prevent your fix from being accidentally removed later. |
U+180E is |
Then this means this shim is specifically targeting 5.1.0, thanks for letting me know. I'll get a test in later today, and then feel the frustration that comes with not being able to rebase from GitHub's Web-based interface. |
Thanks! Totally agree that it's lame you can't rebase on the web |
The ES6 spec is, and this shim aims to follow that. Thanks for bringing this up! We might be able to simplify the regexp (see my earlier comment: #354 (comment)). |
The downside to using ranges in the regexp string is that it couldn't be re-purposed to check for failure to trim all trimmable whitespace (although it would save characters to use |
It's ok to make multiple regexes rather than repurposing - the cost of that is negligible. |
I never said we should! That wouldn’t work in any existing browser. I was suggesting |
@lewisje Are you planning on completing this PR? |
I am, but I got distracted 😞 |
dcbdd5d
to
53beb7c
Compare
I made Maybe this means, though, that in es6-shim.js I should have also tested those two characters separately instead of in the same string. |
Check for two commonly erroneously trimmed characters instead of one, and check for erroneously failing to trim the trimmable whitespace characters. I would also replace `if (typeof this === 'undefined' || this === null)` with `if (this == null)` but I don't know whether that's allowed by this library's style rules.
53beb7c
to
f337db9
Compare
9647bf8
to
49a96e8
Compare
Check for two commonly erroneously trimmed characters instead of one, and check for erroneously failing to trim the trimmable whitespace characters; then use
aa*
instead ofa+
, and two regexes instead of one, both because they are more performant (it is also more performant to first trim from the beginning and then trim from the end).I would also replace
if (typeof this === 'undefined' || this === null)
withif (this == null)
but I don't know whether that's allowed by this library's style rules.I would also update the the minified shim file, but I suspect that's something the maintainers instead do before every proper release.