I have created a simple function for facerecognition by using the facerecognizer from OpenCV. It works all fine with images from people.
Now I would like to make a test by using handwritten characters instead of people. I came across MNIST dataset, but they store images in a weird file which I have never seen before.
I simply need to extract a few images from:
train-images.idx3-ubyte
and save them in a folder as .gif
Or am I missunderstand this MNIST thing. If yes where could I get such a dataset?
EDIT
I also have the gzip file:
train-images-idx3-ubyte.gz
I am trying to read the content, but show()
does not work and if I read()
I see random symbols.
images = gzip.open("train-images-idx3-ubyte.gz", 'rb')
print images.read()
EDIT
Managed to get some usefull output by using:
with gzip.open('train-images-idx3-ubyte.gz','r') as fin:
for line in fin:
print('got line', line)
Somehow I have to convert this now to an image, output:
Download the training/test images and labels:
And uncompress them in a workdir, say samples/
.
Get the python-mnist package from PyPi:
pip install python-mnist
Import the mnist
package and read the training/test images:
from mnist import MNIST
mndata = MNIST('samples')
images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()
To display an image to the console:
index = random.randrange(0, len(images)) # choose an index ;-)
print(mndata.display(images[index]))
You'll get something like this:
............................
............................
............................
............................
............................
.................@@.........
..............@@@@@.........
............@@@@............
..........@@................
..........@.................
...........@................
...........@................
...........@...@............
...........@@@@@.@..........
...........@@@...@@.........
...........@@.....@.........
..................@.........
..................@@........
..................@@........
..................@.........
.................@@.........
...........@.....@..........
...........@....@@..........
............@@@@............
.............@..............
............................
............................
............................
Explanation:
list
of unsigned bytes.array
of unsigned bytes.