-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added dotplot - show all datapoints for a category #226
Conversation
Added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how the plots look and I think they are a nice alternative to violin plots!
However, I'm not quite sure about the name. On the one hand, there's this: https://en.wikipedia.org/wiki/Dot_plot_(statistics), on the other hand, somehow it jsut sounds like another term for scatter plot. I don't have a better suggestion, though.
I'm also not sure if it is a problem that plots look different for the same data due to the random horizontal distribution of the points. Again, I can't think of a better implementation.
I'd rather add it to StatsPlots, than not having it.
Maybe @piever or @mkborregaard have a different opinion.
Another idea would be to add it as a style option for violin plots.
I'm not an expert on the
I think ggplot2 in R also has some jitter option, maybe we could check how they do it (if they accept that the same data could give different plots, or fix some random seed or some other solution I can't think of). |
Ahh, that's the name of that plot, #61 is definitely going beyond my immediate goals, but I do think the efforts should be combined. I'll see how I can help; perhaps a simpler version using jitter alone (seems to be the term in use) can be added as a step to the more complex one. Being able to specify (or retrieve?) a seed might be useful in case one finds a horizontal distribution they like... (I'm not a huge fan of the original beeswarm because it can have weird spikes that IMO distract from the distribution, but at the same time benefits from showing all points unambiguously). |
Thanks for this @BioTurboNick . So, I think this is a beeswarm plot + a boxplot. I think it would definitely be nice to combine this with #61 - the code I posted in a comment creates a beeswarm plot which I personally find superior to any other implementation I've seen. It does have some spikes, but that is the result from wanting to show the dots like this. As for this, I think the ideomatic way to do this in StatsPlots would be
We could have a recipe that combines these - in that case IMHO it should be named for the components, eg like @BioTurboNick would you consider trying to merge your beeswarm implementation with the implementation I posted? Or do you actually prefer the jittered version? |
@mkborregaard - Yeah, I'd be down to tackle that. It'll be a few weeks, trying to get a paper out and have a vacation. |
I've looked at the Wilkinson paper on dot plots and I'm persuaded that "dot plot" is the proper, original, general term for this type of plot. I added a non-displaced version and a more refined jittered version based on the violin plot. I think the version based on violin is a bit better and I'll likely replace my original version. As to a version that guarantees no overlap, I looked a bit into the code from #61 and it needs some upgrading to Julia 1+, and I'm not familiar with the old conventions. However, I think that could be something added later, while the jittered version may be completed now. Especially since to do it properly would require knowledge of marker size, which I understand will be coming to Plots in the future? I also thought about the random seed issue, but I can't think of a good way to present it. However, with low numbers of points I don't believe there would be too much value in choosing a particular visual distribution. And if the distribution gets dense enough, using a violin plot, or the non-overlap version, might be a better choice. So, my question is what do I need to do to complete a release-able version of the jittering code? |
I had a question about the code at the top for when only y is provided. It's taken from The behavior of the code appears to be that if |
Sure, feel free to add a better comment. |
Here's a figure comparing the three modes, now specified by Thinking through the possible variants, we could have:
If these are all implemented in this one recipe, how should they be accessed? Could do: Thoughts? |
I like your thinking there |
(the conflict is the REQUIRE file - maybe just delete that and add the dep to the Project.toml instead? |
Guess they conflict either way. |
Yeah - do you know how to do a local rebase? |
@mkborregaard - I think so, if I know what the goal is? |
As you can see your groupspacingfix PR is also based on the old master, which is what makes the commit tree look a bit non-linear. |
Ah, I see. I'll figure it out, thanks! |
I can warmly recommend gitkraken instead of fiddling around with the command line. It's just rebase->resolve->force push. Might be a good idea to have a backup branch pointing to this commit before attempting the rebase, in case something goes wrong. |
Oh awesome tool, thanks! |
You don't have to commit the manifest file - just delete it. It's not necessary for packages. |
There's a trend in science to show all individual datapoints when something like a boxplot would be used. This dotplot is designed to allow individual data points to be overlaid onto a boxplot. Violin plots, while showing the overall shape of a distribution, may be deceptive for sparse points.
Currently, it uses a window around each point (based on quantiles) and randomization to spread the points out to keep the point density the same across the plot. May be extended later to other forms (no-overlap).
A future addition could allow raincloud plots to be created: https://micahallen.org/2018/03/15/introducing-raincloud-plots/