MemoryError: Unable to allocate 115. GiB for an array with shape (1122, 1122, 12288) and data type float64

John Jones picture John Jones · May 22, 2020 · Viewed 7.5k times · Source

I am trying to pass a function that returns a flattened array of images and labels and my OS is windows 10. Moreover, when i try calling the function i the error described in the title

MemoryError: Unable to allocate 115. GiB for an array with shape (1122, 1122, 12288) and data type float64

What i want to do is: i want to extract features from a dataset with keypoints on them, inside a function and use train_test_split for my dataset, but even if i try to flatten the images with keypoints, it'll get me the error, the only way to flatten are the same images without keypoints.

Here's how i was trying:

def load_image_files(fullpath, dimension=(35, 35)):
    flat_data = []
    orb = cv2.ORB_create(edgeThreshold=1, nfeatures=22)
    key_points = [cv2.KeyPoint(64, 9, 10), 
    cv2.KeyPoint(107, 6, 10), 
    cv2.KeyPoint(171, 10, 10)]
    kp, des = orb.compute(imageList, key_points)
    kparray = cv2.drawKeypoints(imageList, kp, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS); 
    img_resized = resize(kparray, dimension, anti_aliasing=True, mode='reflect')
    img_resized = img_resized.flatten()
    flat_data.append(img_resized)
    images.append(flat_data)

        flat_data = np.array(flat_data)
        images = np.array(images)
        return Bunch(data=flat_data,
                     images=images)

Answer

Evil Angel picture Evil Angel · May 22, 2020

Here in your function.You are appending all of your flatten images to a single list which is causing this memory error.Instead you can use dask arrays to store them.The dask array uses the hard disk to store the data which is very large to fit in memory.Dask is a python library similar to sparks which has been designed for big data.