I am running code for all possible models of a phylogenetic generalised linear model. The issue I am having is extracting and saving the beta coefficients for each model.
I want to save the coefficients into a matrix, where the columns correspond to a specific variable and the rows correspond to a formula.The issue arises because the variables are different for every model. So one cannot simply row bind the coefficients to the matrix.
The example below shows where I am up to in the problem:
y = rnorm(10)
inpdv = matrix(c(rnorm(10), runif(10), rpois(10, 1)), ncol = 3)
colnames(inpdv) = c("A", "B", "C")
data = cbind(y, inpdv)
model.mat = expand.grid(c(TRUE,FALSE), c(TRUE,FALSE), c(TRUE,FALSE))
names(model.mat) = colnames(inpdv)
formula = apply(model.mat, 1, function(x)
paste(colnames(model.mat)[x], collapse=" + "))
formula = paste("y", formula, sep = " ~ ")
formula[8] = paste(formula[8], 1, sep = "")
beta = matrix(NA, nrow = length(formula), ncol = 3)
for(i in 1:length(formula)){
fit = lm(formula(formula), data)
## Here I want to extract the beta coeffecients here into a n * k matrix
## However, I cannot find a way to assign the value to the right cell in the matrix
}
So I imagine each coefficient will need to be placed into the respective cell, but I cannot think of a quick and efficient way of doing so.
The true analysis will take place around 30, 000 times, so any tips on efficiency would also be appreciated.
Edit: So as an example, the output for a model of y ~ a + c will need to be in the form of
a NA b
Where the letters represent the coefficient for that model. The next model may be y ~ b + c which would then be added in the bottom. So the result would now look like
a NA b
NA b c
How about using names
and %in%
to subset the right columns. Extract the coefficient values using coef
. Like this:
beta = matrix(NA, nrow = length(formula), ncol = 3)
colnames(beta) <- colnames(inpdv)
for(i in 1:length(formula)){
fit = lm(formula(formula[i]), data)
coefs <- coef(fit)
beta[ i , colnames(beta) %in% names( coefs ) ] <- coefs[ names( coefs ) %in% colnames( beta ) ]
}
# A B C
#[1,] -0.4229837 -0.0519900 0.3787666
#[2,] NA 0.7015679 0.0555350
#[3,] -0.4165834 NA 0.3692974
#[4,] NA NA 0.1346726
#[5,] -0.2035173 0.7049951 NA
#[6,] NA 0.7978726 NA
#[7,] -0.2229959 NA NA
#[8,] NA NA NA
I think it's perfectly acceptable to use a for
loop here. For starters using something like lapply
sometimes keep increasing memory usage as you run more and more of the simulations. R will sometimes not mark objects from earlier models as trash until the lapply
loop finishes so so can sometimes get a memory allocation error. Using the for
loop I find that R will reuse memory allocated to the previous iteration of the loop if necessary so if you can run the loop once, you can run it lots of times.
The other reason not to use a for
loop is speed, but I would assume that the time to iterate is negligible compared to the time to fit the model so I would use it.