How to model HashMap/Dictionary in the ProtoBuf efficiently

Lan picture Lan · Aug 24, 2015 · Viewed 24.2k times · Source

I have a protobuf file serialized by .NET code and I would like to consume it into Java. In the .NET code, there is Dictionary data type and the proto schema looks like

message Pair {
   optional string key = 1;
   optional string value = 2;
}

message Dictionary {
   repeated Pair pairs = 1;
}

Just as described in stackoverflow post Dictionary in protocol buffers.

I can use protoc to compile the proto file into Java classes fine. I can deserialize the protobuf file into Java objects successfully. The only problem is that it translates to a List of Pair objects in Java instead of HashMap. Of course, I still have all the data, but I cannot access the data as efficiently as I prefer. If I have the value of the key, I have to loop through the whole list to get its corresponding value. This does not seem to be optimal.

I am wondering if there is a better way to model Dictionary/Map data type in the protobuf.

Thanks

Update:

I tried Jon Skeet's suggestion to add map type field in the addressbook example and still ran into issue.

message Person {
  required string name = 1;
  required int32 id = 2;        // Unique ID number for this person.
  optional string email = 3;
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }
  repeated PhoneNumber phone = 4;
  map<string, string> mapdata = 5;
}

The protoc throws error when compiling

addressbook.proto:25:3: Expected "required", "optional", or "repeated".
addressbook.proto:25:6: Expected field name.

According to Google protobuf doc, proto 2 does support map type https://developers.google.com/protocol-buffers/docs/proto#maps . As I quote,

Maps cannot be repeated, optional, or required.

So I don't really know why protoc cannot compile it. There is another discussion here have to create java pojo for the existing proto includes Map. The answer suggests that map is only a proto 3 feature. This contradicts google's documentation.

Answer

Jon Skeet picture Jon Skeet · Aug 24, 2015

Well, maps are already supported in "protobuf proper" as of v3.0. For example, your proto is effectively:

message Dictionary {
    map<string, string> pairs = 1;
}

The good news is that with the key and value fields you've defined, that's fully backward-compatible with your existing data :)

The bad news is that I don't know whether or not protobuf-net supports it. If you're not actually using the .proto file on the .NET side, and doing everything declaratively, you may just be able to modify your .proto file, regenerate the Java code, and go...

The remaining bad news is that maps were introduced in v3.0 which is still in alpha/beta at the time of this writing. Now, depending on when you need to ship, you may decide to bet on v3.0 being released by the time you need it - the benefits of having nice map syntax are pretty significant, in my view. Most of the changes being made at the moment are around the new proto3 features - whereas maps are allowed within proto2 syntax files too... it's just that you need the v3.0 compiler and runtime to use them.