How do I generate a data set consisting of N = 100
2-dimensional samples x = (x1,x2)T ∈ R2
drawn from a 2-dimensional Gaussian distribution, with mean
µ = (1,1)T
and covariance matrix
Σ = (0.3 0.2
0.2 0.2)
I'm told that you can use a Matlab function randn
, but don't know how to implement it in Python?
Just to elaborate on @EamonNerbonne's answer: the following uses Cholesky decomposition of the covariance matrix to generate correlated variables from uncorrelated normally distributed random variables.
import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg
N = 1000
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()
The yellow dots were generated by np.random.multivariate_normal
. The green dots were generated by multiplying normally distributed points by the Cholesky decomposition matrix L
.