I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution. This is what I have now:
mydf = merge(df.1, df.2)
colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName"
Can I simplify this code, either the original merge()
call or just the second line? "MyName.1"
is actually the result of an xts merge
of two different xts objects.
The trouble with changing column names of a data.frame
is that, almost unbelievably, the entire data.frame
is copied. Even when it's in .GlobalEnv
and no other variable points to it.
The data.table
package has a setnames()
function which changes column names by reference without copying the whole dataset. data.table
is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old
and the new
names:
require(data.table)
setnames(DT,"MyName.1", "MyNewName")
# or more explicit:
setnames(DT, old = "MyName.1", new = "MyNewName")
?setnames