I'm using the survival
library. After computing the Kaplan-Meier estimator of a survival function:
km = survfit(Surv(time, flag) ~ 1)
I know how to compute percentiles:
quantile(km, probs = c(0.05,0.25,0.5,0.75,0.95))
But, how do I compute the mean survival time?
The mean survival time will in general depend on what value is chosen for the maximum survival time. You can get the restricted mean survival time with print(km, print.rmean=TRUE)
. By default, this assumes that the longest survival time is equal to the longest survival time in the data. You can set this to a different value by adding an rmean
argument (e.g., print(km, print.rmean=TRUE, rmean=250)
).
In response to your comment: I initially figured one could extract the mean survival time by looking at the object returned by print(km, print.rmean=TRUE)
, but it turns out that print.survfit
doesn't return a list object but just returns text to the console.
Instead, I looked through the code of print.survfit
(you can see the code by typing getAnywhere(print.survfit)
in the console) to see where the mean survival time is calculated. It turns out that a function called survmean
takes care of this, but it's not an exported function, meaning R won't recognize the function when you try to run it like a "normal" function. So, to access the function, you need to run the code below (where you need to set rmean
explicitly):
survival:::survmean(km, rmean=60)
You'll see that the function returns a list where the first element is a matrix with several named values, including the mean and the standard error of the mean. So, to extract, for example, the mean survival time, you would do:
survival:::survmean(km, rmean=60) [[1]]["*rmean"]
The help for print.survfit
provides details on the options and how the restricted mean is calculated:
?print.survfit
The mean and its variance are based on a truncated estimator. That is, if the last observation(s) is not a death, then the survival curve estimate does not go to zero and the mean is undefined. There are four possible approaches to resolve this, which are selected by the rmean option. The first is to set the upper limit to a constant, e.g.,rmean=365. In this case the reported mean would be the expected number of days, out of the first 365, that would be experienced by each group. This is useful if interest focuses on a fixed period. Other options are "none" (no estimate), "common" and "individual". The "common" option uses the maximum time for all curves in the object as a common upper limit for the auc calculation. For the "individual"options the mean is computed as the area under each curve, over the range from 0 to the maximum observed time for that curve. Since the end point is random, values for different curves are not comparable and the printed standard errors are an underestimate as they do not take into account this random variation. This option is provided mainly for backwards compatability, as this estimate was the default (only) one in earlier releases of the code. Note that SAS (as of version 9.3) uses the integral up to the last event time of each individual curve; we consider this the worst of the choices and do not provide an option for that calculation.