A basic/common class in R is called "dist"
, and is a relatively efficient representation of a symmetric distance matrix. Unlike a "matrix"
object, however, there does not seem to be support for manipulating an "dist"
instance by index pairs using the "["
operator.
For example, the following code returns nothing, NULL
, or an error:
# First, create an example dist object from a matrix
mat1 <- matrix(1:100, 10, 10)
rownames(mat1) <- 1:10
colnames(mat1) <- 1:10
dist1 <- as.dist(mat1)
# Now try to access index features, or index values
names(dist1)
rownames(dist1)
row.names(dist1)
colnames(dist1)
col.names(dist1)
dist1[1, 2]
Meanwhile, the following commands do work, in some sense, but do not make it any easier to access/manipulate particular index-pair values:
dist1[1] # R thinks of it as a vector, not a matrix?
attributes(dist1)
attributes(dist1)$Diag <- FALSE
mat2 <- as(dist1, "matrix")
mat2[1, 2] <- 0
A workaround -- that I want to avoid -- is to first convert the "dist"
object to a "matrix"
, manipulate that matrix, and then convert it back to "dist"
. That is also to say, this is not a question about how to convert a "dist"
instance into a "matrix"
, or some other class where common matrix-indexing tools are already defined; since this has been answered in several ways in a different SO question
Are there tools in the stats
package (or perhaps some other core R package) dedicated indexing/accessing elements of an instance of "dist"
?
There aren't standard ways of doing this, unfortunately. Here's are two functions that convert between the 1D index into the 2D matrix coordinates. They aren't pretty, but they work, and at least you can use the code to make something nicer if you need it. I'm posting it just because the equations aren't obvious.
distdex<-function(i,j,n) #given row, column, and n, return index
n*(i-1) - i*(i-1)/2 + j-i
rowcol<-function(ix,n) { #given index, return row and column
nr=ceiling(n-(1+sqrt(1+4*(n^2-n-2*ix)))/2)
nc=n-(2*n-nr+1)*nr/2+ix+nr
cbind(nr,nc)
}
A little test harness to show it works:
dist(rnorm(20))->testd
as.matrix(testd)[7,13] #row<col
distdex(7,13,20) # =105
testd[105] #same as above
testd[c(42,119)]
rowcol(c(42,119),20) # = (3,8) and (8,15)
as.matrix(testd)[3,8]
as.matrix(testd)[8,15]