How to convert a RGB image (3 channel) to grayscale (1 channel) and save it?

J. Devez picture J. Devez · Oct 10, 2018 · Viewed 11.7k times · Source

Working with a deep learning project and I have a lot of images, that don't need to have colors. I saved them doing:

import matplotlib.pyplot as plt

plt.imsave('image.png', image, format='png', cmap='gray')

However later when I checked the shape of the image the result is:

import cv2
img_rgb = cv2.imread('image.png')
print(img_rgb.shape)
(196,256,3)

So even though the image I view is in grayscale, I still have 3 color channels. I realized I had to do some algebric operations in order to convert those 3 channels into 1 single channel.

I have tried the methods described on the thread "How can I convert an RGB image into grayscale in Python?" but I'm confused.

For example, when to do the conversion using:

from skimage import color
from skimage import io
img_gray = color.rgb2gray(io.imread('image.png'))
plt.imsave('image_gray.png', img_gray, format='png')

However when I load the new image and check its shape:

img_gr = cv2.imread('image_gray.png')
print(img_gr.shape)
(196,256,3)

I tried the other methods on that thread but the results are the same. My goal is to have images with a (196,256,1) shape, given how much less computationally intensive it will be for a Convolutional Neural Network.

Any help would be appreciated.

Answer

jmsinusa picture jmsinusa · Oct 10, 2018

Your first code block:

import matplotlib.pyplot as plt
plt.imsave('image.png', image, format='png', cmap='gray')

This is saving the image as RGB, because cmap='gray' is ignored when supplying RGB data to imsave (see pyplot docs).

You can convert your data into grayscale by taking the average of the three bands, either using color.rgb2gray as you have, or I tend to use numpy:

import numpy as np
from matplotlib import pyplot as plt
import cv2

img_rgb = np.random.rand(196,256,3)
print('RGB image shape:', img_rgb.shape)

img_gray = np.mean(img_rgb, axis=2)
print('Grayscale image shape:', img_gray.shape)

Output:

RGB image shape: (196, 256, 3)
Grayscale image shape: (196, 256)

img_gray is now the correct shape, however if you save it using plt.imsave, it will still write three bands, with R == G == B for each pixel. This is because, I believe, a PNG file requires three (or four) bands. Warning: I am not sure about this: I expect to be corrected.

plt.imsave('image_gray.png', img_gray, format='png')
new_img = cv2.imread('image_gray.png')
print('Loaded image shape:', new_img.shape)

Output:

Loaded image shape: (196, 256, 3)

One way to avoid this is to save the images as numpy files, or indeed to save a batch of images as numpy files:

np.save('np_image.npy', img_gray)
new_np = np.load('np_image.npy')
print('new_np shape:', new_np.shape)

Output:

new_np shape: (196, 256)

The other thing you could do is save the grayscale png (using imsave) but then only read in the first band:

finalimg = cv2.imread('image_gray.png',0)
print('finalimg image shape:', finalimg.shape)

Output:

finalimg image shape: (196, 256)