Graphing results of dbscan in R

droops picture droops · Jul 26, 2011 · Viewed 7k times · Source

Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.

I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).

I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.

The dbscan() creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).

I can graph the dbscan type using a basic plot() call. But, like I said, this will graph the irrelevant 39,000 points.

tl;dr: how do I graph only specific clusters of a dbscan data type?

Answer

joran picture joran · Jul 26, 2011

If you look at the help page (?dbscan) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan returns. In this case it is simply a list (a standard R data type) with a few components.

The cluster component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.

For example, if we use the first example from the help page:

set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
    sd=0.2))
ds <- dbscan(x, 0.2)

we can then use the result, ds to plot only the points in clusters 1-3:

#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])