Running a two-sample t.test with unequal sample size in R

Trying2Learn picture Trying2Learn · Oct 15, 2018 · Viewed 12.2k times · Source

I am trying to run a two-sample t-test for a difference between a treatment and control group. Data is not paired. When I subset my original dataframe, I found that I have unequal sample sizes (not an issue by hand, but R seems to make it an issue). Here is my code:

CG<-subset(data,treat=="Control")
TG<-subset(data,treat!="Control")
agep <-t.test(CG$age~TG$age)$p.value

The error I get is the following:

Error in model.frame.default(formula = CG$age ~ TG$age) : 
variable lengths differ (found for 'TG$age')

Yes! The lengths do differ. Not sure why that's a problem if I'm not running a paired test? Thanks in advance for any help.

Answer

Cory Caaz picture Cory Caaz · Oct 15, 2018

If the unequal sample sizes are independent groups, then the mean can be parsed in R via an unpaired two-sample t-test.

First, ensure that your data pass a test of homoscedasticity--are the variances homogenous? We do this in R with a Fisher's F-test, var.test(x, y).

CG <- subset(data, treat == "Control")
TG <- subset(data, treat != "Control")
var.test(CG, TG)

If your p > 0.05, then you can assume that the variances of both samples are homogenous. In this case, we run a classic Student's two-sample t-test by setting the parameter var.equal = TRUE.

agep <- t.test(CG$age, TG$age, var.equal = TRUE)

If the F-test returns a p < 0.05, then you can assume that the variances of the two groups are different (heteroscedasticity). In this case, you can run a Welch t-statistic. Simply set var.equal = FALSE.

agep <- t.test(CG$age, TG$age, var.equal = FALSE)