Generate a data set consisting of N=100 2-dimensional samples

pythonnewbie picture pythonnewbie · Feb 17, 2013 · Viewed 9.8k times · Source

How do I generate a data set consisting of N = 100 2-dimensional samples x = (x1,x2)T ∈ R2 drawn from a 2-dimensional Gaussian distribution, with mean

µ = (1,1)T

and covariance matrix

Σ = (0.3 0.2 
     0.2 0.2)

I'm told that you can use a Matlab function randn, but don't know how to implement it in Python?

Answer

unutbu picture unutbu · Feb 17, 2013

Just to elaborate on @EamonNerbonne's answer: the following uses Cholesky decomposition of the covariance matrix to generate correlated variables from uncorrelated normally distributed random variables.

import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg

N = 1000
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

enter image description here

The yellow dots were generated by np.random.multivariate_normal. The green dots were generated by multiplying normally distributed points by the Cholesky decomposition matrix L.