Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LongBench] How was Edit Sim for code tasks calculated? #96

Open
cornzz opened this issue Jan 9, 2025 · 1 comment
Open

[LongBench] How was Edit Sim for code tasks calculated? #96

cornzz opened this issue Jan 9, 2025 · 1 comment

Comments

@cornzz
Copy link

cornzz commented Jan 9, 2025

Hi,

I just noticed that the fuzzywuzzy.ratio function, used for edit sim calculation here, calculates different scores, depending on whether python-Levenshtein is installed or not. Also, it does not actually calculate Levenshtein distance at all.

If python-Levenshtein is not installed, it falls back to difflib.SequenceMatcher.ratio(), which in fact calculates something similar to the Ratcliff/Obershelp algorithm, according to the docs, which is different from edit similarity: python/cpython#69578 (comment)

If python-Levenshtein is installed, the ratio function still does not return the Levenshtein distance ratio but the InDel ratio, which does not allow substitutions! seatgeek/thefuzz#53

I am confused, how do I correctly evaluate the code tasks? Without this dependency something completely different is calculated, even though in the paper it is explicitly mentioned that Levenshtein distance is used for edit sim. However, this repo does not mention python-Levenshtein as a dependency. And even when using this dependency, its still not the actual Levenshtein distance!

@bys0318

@cornzz
Copy link
Author

cornzz commented Jan 9, 2025

It seems that this is a widespread inconsistency, for example in the RepoBench-P repository, python-Levenshtein is listed as a requirement, but in the LCC repo it is not mentioned...

Just as an example, here are some results without python-Levenshtein:

Task: lcc - 44.52
Task: repobench-p - 45.71

With python-Levenshtein:

Task: lcc - 47.55
Task: repobench-p - 49.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant