I'm trying to understand how data is interpreted in Caffe. For that I've taken a look at the Minst Tutorial Looking at the input data definition:
layers {
name: "mnist"
type: DATA
data_param {
source: "mnist_train_lmdb"
backend: LMDB
batch_size: 64
scale: 0.00390625
}
top: "data"
top: "label"
}
I've now looked at the mnist_train_lmdb and taken one of the entries (shown in hex):
0801101C181C229006
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000054B99F973C2400000000000000000000000000000000
000000000000DEFEFEFEFEF1C6C6C6C6C6C6C6C6AA34000000000000
00000000000043724872A3E3FEE1FEFEFEFAE5FEFE8C000000000000
000000000000000000000011420E4343433B15ECFE6A000000000000
00000000000000000000000000000000000053FDD112000000000000
000000000000000000000000000000000016E9FF5300000000000000
000000000000000000000000000000000081FEEE2C00000000000000
000000000000000000000000000000003BF9FE3E0000000000000000
0000000000000000000000000000000085FEBB050000000000000000
00000000000000000000000000000009CDF83A000000000000000000
0000000000000000000000000000007EFEB600000000000000000000
00000000000000000000000000004BFBF03900000000000000000000
0000000000000000000000000013DDFEA60000000000000000000000
00000000000000000000000003CBFEDB230000000000000000000000
00000000000000000000000026FEFE4D000000000000000000000000
00000000000000000000001FE0FE7301000000000000000000000000
000000000000000000000085FEFE3400000000000000000000000000
000000000000000000003DF2FEFE3400000000000000000000000000
0000000000000000000079FEFEDB2800000000000000000000000000
0000000000000000000079FECF120000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
2807
(I've added the line breaks here to be able to see the '7' digit.)
Now my question is, where this format is described? Or put differently where is defined that the first 36 bytes are some sort of header and the last 8 bytes have some label correspondence?
How would I go about constructing my own data? Neither Blob Tutorial nor Layers Definition give much away about required formats. My intention is not to use image data, but time series
Thanks!
I realized that protocol buffers must come into play here. So I tried to deserialize it against some of the types defined in caffe.proto.
Datum seems to be the perfect fit:
{Caffe.Datum}
Channels: 1
Data: {byte[784]}
Encoded: false
FloatData: Count = 0
Height: 28
Label: 7
Width: 28
So the answer is simply: It's a serialized representation of a 'Datum' typed instance as defined per caffe.proto
Btw. since english is not my native language I had to first realize that "Datum" is a singular form of "data"
When it comes to using your own data, it's structured as follows:
The conventional blob dimensions for data are number N x channel K x height H x width W. Blob memory is row-major in layout so the last / rightmost dimension changes fastest. For example, the value at index (n, k, h, w) is physically located at index ((n * K + k) * H + h) * W + w.
See Blobs, Layers, and Nets: anatomy of a Caffe model for reference