"Flat is better than nested" - for data as well as code?

Karl Knechtel picture Karl Knechtel · Dec 7, 2010 · Viewed 10.9k times · Source

This question got me thinking: should we apply the principle that "flat is better than nested" to data as well as to code? Even when there is a "logical tree structure" to the data?

In this case, I suppose it would mean representing children as a list of IDs, rather than an actual list of children, with all the nodes in a single list:

[ {'id': 4, 'children': ()},
  {'id': 2, 'children': (1, 7)},
  {'id': 1, 'children': (6, 5)},
  {'id': 6, 'children': ()},
  {'id': 5, 'children': ()},
  {'id': 7, 'children': (3,)},
  {'id': 3, 'children': ()} ]

(I used tuples because I prefer not to give myself the flexibility to mutate an object until that flexibility proves itself to be useful and usable in a clear manner. In any case I would never use None here instead of an empty sequence, because it complicates the logic, and "special cases aren't special enough" - here, it isn't special at all.)

Certainly this is shorter, but the tree structure is obscured. Does that contradict "explicit is better than implicit"?

Personally I find that "flat is better than nested" is of limited applicability, and nowhere near the most important aspect of the Zen. (Certainly I could not do a lot of the nice functional-programming things that I do if I didn't allow myself significant nesting.) I suspect that the problem with "nested" is that it requires context switching when you comprehend the information. I really think this is more of a problem when following imperative logic than for parsing data or functional-style code - where it's easier to just mentally name the nested block, and consider its workings separately from the outer context.

What say you?

Answer

Jim Mischel picture Jim Mischel · Dec 7, 2010

This is a completely subjective question. The answer is, "it depends."

It depends on the primary use of your data. If you continually have to reference the nested structure, then it makes sense to represent it that way. And if you never reference the flat representation except when building the nested structure, then why have the flat structure at all?

The "flat" representation is one of the basics of the relational database model: each type of data exists in a table just for that type, and any relationships among the items are contained in separate tables. It's a useful abstraction, but at times difficult to work with in code. On the other hand, processing all the data of a particular type is trivial.

Consider, for example, if I wanted to find all the descendants of the record with id 2 in your example data. If the data is already in the hierarchy (i.e. native representation is the "nested" structure), then it's trivial to locate record id 2 and then traverse its children, children's children, etc.

But if the native representation is sequential as you've shown it then I have to pass through the entire data set to create the hierarchical structure and then find record 2 and its children.

So, as I said, "it depends."