RESTful design: when to use sub-resources?

Gili picture Gili · Nov 21, 2012 · Viewed 28.2k times · Source

When designing resource hierarchies, when should one use sub-resources?

I used to believe that when a resource could not exist without another, it should be represented as its sub-resource. I recently ran across this counter-example:

  • An employee is uniquely identifiable across all companies.
  • An employee's access control and life-cycle depend on the company.

I modeled this as: /companies/{companyName}/employee/{employeeId}

Notice, I don't need to look up the company in order to locate the employee, so should I? If I do, I'm paying a price to look up information I don't need. If I don't, this URL mistakenly returns HTTP 200:

/companies/{nonExistingName}/employee/{existingId}

  1. How should I represent the fact that a resource to belongs to another?
  2. How should I represent the fact that a resource cannot be identified without another?
  3. What relationships are sub-resources meant and not meant to model?

Answer

Gili picture Gili · Oct 10, 2013

A year later, I ended with the following compromise (for database rows that contain a unique identifier):

  1. Assign all resources a canonical URI at the root (e.g. /companies/{id} and /employees/{id}).
  2. If a resource cannot exist without another, it should be represented as its sub-resource; however, treat the operation as a search engine query. Meaning, instead of carrying out the operation immediately, simply return HTTP 307 ("Temporary redirect") pointing at the canonical URI. This will cause clients to repeat the operation against the canonical URI.
  3. Your specification document should only expose root resources that match your conceptual model (not dependent on implementation details). Implementation details might change (your rows might no longer be unique identifiable) but your conceptual model will remain intact. In the above example, you'd tell clients about /companies but not /employees.

This approach has the following benefits:

  1. It eliminates the need to do unnecessary database look-ups.
  2. It reduces the number of sanity-checks to one per request. At most, I have to check whether an employee belongs to a company, but I no longer have to do two validation checks for /companies/{companyId}/employees/{employeeId}/computers/{computerId}.
  3. It has a mixed impact on database scalability. On the one hand you are reducing lock contention by locking less tables, for a shorter period of time. But on the other hand, you are increasing the possibility of deadlocks because each root resource must use a different locking order. I have no idea whether this is a net gain or loss but I take comfort in the fact that database deadlocks cannot be prevented anyway and the resulting locking rules are simpler to understand and implement. When in doubt, opt for simplicity.
  4. Our conceptual model remains intact. By ensuring that the specification document only exposes our conceptual model, we are free to drop URIs containing implementation details in the future without breaking existing clients. Remember, nothing prevents you from exposing implementation details in intermediate URIs so long as your specification declares their structure as undefined.