Polymorphism in Protocol Buffers 3

Adam Matan picture Adam Matan · Nov 13, 2016 · Viewed 8.3k times · Source

The current design

I am refactoring some exiting API code that returns a feed of events for a user. The API is a normal RESTful API, and the current implementation simply queries a DB and returns a feed.

The code is long and cumbersome, so I've decided to move the feed generation to a microservice that will be called from the API server.

The new design

For the sake of decoupling, I thought that the data may move back and forth from the API server to the microservice as Protobuf objects. This way, I can change the programming language on either end and still enjoy the type safety and slim size of protobuf.

enter image description here

The problem

The feed contains multiple types (e.g. likes, images and voice messages). In the future, new types can be added. They all share a few properties timestamp and title, for instance - but other than that they might be completely different.

In classic OOP, the solution is simple - a base FeedItem class from which all feed items inherit, and a Feed class which contains a sequence of FeedItem classes.

How do I express the notion of Polymorphism in Protocol Buffers 3, or at least enable different types of messages in a list?

What have I checked

  • Oneof: "A oneof cannot be repeated".
  • Any: Too broad (like Java's List<Object>.

Answer

Kiril picture Kiril · Mar 4, 2019

The answer for serialization protocols is to use discriminator based polymorphism. Traditional Object Oriented inheritance is a form of that with some very bad characteristics. In newer protocols like OpenAPI the concept is a bit cleaner.

Let me explain how this works with proto3

First you need to declare your polymorphic types. Suppose we go for the classic animal species problem where different species have different properties. We first need to define a root type for all animals that will identify the species. Then we declare a Cat and Dog messages that extend the base type. Note that the discriminator species is projected in all 3:

 message BaseAnimal {
   string species = 1;
 }

 message Cat {
   string species = 1;
   string coloring = 10;
 }

 message Dog {
   string species = 1;
   int64 weight = 10;
 }

Here is a simple Java test to demonstrate how things work in practice

    ByteArrayOutputStream os = new ByteArrayOutputStream(1024);

    // Create a cat we want to persist or send over the wire
    Cat cat = Cat.newBuilder().setSpecies("CAT").setColoring("spotted")
            .build();

    // Since our transport or database works for animals we need to "cast"
    // or rather convert the cat to BaseAnimal
    cat.writeTo(os);
    byte[] catSerialized = os.toByteArray();
    BaseAnimal forWire = BaseAnimal.parseFrom(catSerialized);
    // Let's assert before we serialize that the species of the cat is
    // preserved
    assertEquals("CAT", forWire.getSpecies());

    // Here is the BaseAnimal serialization code we can share for all
    // animals
    os = new ByteArrayOutputStream(1024);
    forWire.writeTo(os);
    byte[] wireData = os.toByteArray();

    // Here we read back the animal from the wire data
    BaseAnimal fromWire = BaseAnimal.parseFrom(wireData);
    // If the animal is a cat then we need to read it again as a cat and
    // process the cat going forward
    assertEquals("CAT", fromWire.getSpecies());
    Cat deserializedCat = Cat.parseFrom(wireData);

    // Check that our cat has come in tact out of the serialization
    // infrastructure
    assertEquals("CAT", deserializedCat.getSpecies());
    assertEquals("spotted", deserializedCat.getColoring());

The whole trick is that proto3 bindings preserve properties they do not understand and serialize them as needed. In this way one can implement a proto3 cast (convert) that changes the type of an object without loosing data.

Note that the "proto3 cast" is very unsafe operation and should only be applied after proper checks for the discriminator are made. You can cast a cat to a dog without a problem in my example. The code below fails

    try {
        Dog d = Dog.parseFrom(wireData);
        fail();
    } catch(Exception e) {
        // All is fine cat cannot be cast to dog
    }

When property types at same index match it is possible that there will be semantic errors. In the example I have where index 10 is int64 in dog or string in cat proto3 treats them as different fields as their type code on the wire differs. In some cases where type may be string and a structure proto3 may actually throw some exceptions or produce complete garbage.