R: Generate data from a probability density distribution

puslet88 picture puslet88 · Sep 30, 2015 · Viewed 7.5k times · Source

Say I have a simple array, with a corresponding probability distribution.

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it.

I did have success finding a simple solution on my own. Thanks!

Answer

user295691 picture user295691 · Sep 30, 2015

Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input.

The compound expression looks like

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields

hist(random.points, 100)

enter image description here