Scatterplot with too many points

user702432 picture user702432 · Oct 10, 2011 · Viewed 83.4k times · Source

I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.

Answer

joran picture joran · Oct 10, 2011

One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.

This is easy to do in ggplot2:

df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)

enter image description here

Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:

ggplot(df,aes(x=x,y=y)) + stat_binhex()

enter image description here

And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:

ggplot(df,aes(x=x,y=y)) + geom_bin2d()