You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the oob score computed in the fit function is wrong.
The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples.
Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).
I also turn to the implementation of sampling oob of randomforest, and I found following codes:
random_instance = check_random_state(random_state)
sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples
sample_counts = np.bincount(sample_indices, minlength=len(samples))
unsampled_mask = sample_counts == 0
indices_range = np.arange(len(samples))
unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples
then the unsampled_indices is the truely oob sample indices.
The text was updated successfully, but these errors were encountered:
I think the oob score computed in the fit function is wrong.
The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples.
Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).
I also turn to the implementation of sampling oob of randomforest, and I found following codes:
random_instance = check_random_state(random_state)
sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples
sample_counts = np.bincount(sample_indices, minlength=len(samples))
unsampled_mask = sample_counts == 0
indices_range = np.arange(len(samples))
unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples
then the unsampled_indices is the truely oob sample indices.
The text was updated successfully, but these errors were encountered: