I'm learning about multi-tenant applications and how PostgreSQL's schemas can be used for this.
Researching the subject, I ended up finding an article in which the author describes a poor experience when using PostgreSQL's schemas in multi-tenant applications. The main problems would be having bad performance for migrations and high usage of database resources.
It seems like having only one schema (sharing the tables among the tenants) would lead to better performance than having one separated schema for each tenant. But it feels strange to me. I would think the opposite, since indexes on smaller tables tend to be lighter than indexes on larger tables.
Why would the performance be worse when having data separated in a lot of small tables (in multiple schemas), than having data separated in a few huge tables (in a single schema)?
Performance isn't worse, necessarily. As the article explains, there are specific conditions which make the schema approach better or worse depending on your application design and workload. Let me explain the tradeoffs of the "tenant-schema" vs. "shared-table" approaches:
tenant-schema is best when you have a relatively small number of fairly large tenants. An example of this would be an accounting application, with only paid subscription users. Things which make it the better performing option for you include:
Things which make it a poor-performing option include:
Whether tenant-schema is bad for migrations/schema changes really depends on how you're doing them. It's bad for rolling out a universal schema change quickly, but good for deploying schema changes as a gradual rollout across tenants.
shared-table works better for situations when you have a lot of tenants, and a lot of your tenants have very little data. An example of this would be a social medial mobile application which permits free accounts and thus has thousands of abandoned accounts. Other things which make the shared table model beneficial are:
The main drawback of shared-table is the need to append the tenant filter condition onto every single query in the application layer. It's also problematic because:
So which model "performs better" really depends on which tradeoffs hurt you the worst.
There's also a hybrid model, "tenant-view", where the actual data is stored in shared tables, but each application connection uses security barrier views to view the data. This has some of the tradeoffs of each model. Primarily, it has the security benefits of the tenant-schema model with some of the performance drawbacks of both models.