I used to think schema were the "upper wrapper" object before the database itself.
I mean DB.schema.<what_ever_object_name_under_schema>
.
Well, the catalog "wrapper" is now quite confusing. Why should we need a catalog? For what purpose, precisely should the catalog be used?
Mike Sherrill 'Cat Recall' gave an excellent answer. I'll add simply one example: Postgres.
When you install Postgres on a machine, that installation is called a cluster. ‘Cluster’ here is not meant in the hardware sense of multiple computers working together. In Postgres, cluster refers to the fact that you can have multiple unrelated databases all up and running using the same Postgres server engine.
The word cluster is also defined by the SQL Standard in the same way as in Postgres. Closely following the SQL Standard is a primary goal of the Postgres project.
The SQL-92 specification says:
A cluster is an implementation-defined collection of catalogs.
and
Exactly one cluster is associated with an SQL-session
That's an obtuse way of saying a cluster is a database server (each catalog is a database).
So in both Postgres and the SQL Standard we have this containment hierarchy:
This diagram represents a single cluster. In the case of Postgres, you can have more than one cluster per host computer (or virtual OS). Multiple clusters is commonly done, for testing and deploying new versions of Postgres (ex: 9.0, 9.1, 9.2, 9.3, 9.4, 9.5).
If you did have multiple clusters, imagine the diagram above duplicated.
Different port numbers allow the multiple clusters to live side-by-side all up and running at the same time. Each cluster would be assigned its own port number. The usual 5432
is only the default, and can be set by you. Each cluster is listening on its own assigned port for incoming database connections.
For example, a company could have two different software development teams. One writes software to manage the warehouses while the other team builds software to manage sales and marketing. Each dev team has their own database, blissfully unaware of the other’s.
But the IT operations team took a decision to run both databases on a single computer box (Linux, Mac, whatever). So on that box they installed Postgres. So one database server (database cluster). In that cluster, they create two catalogs, a catalog for each dev team: one named 'warehouse' and one named 'sales'.
Each dev team uses many dozens of tables with different purposes and access roles. So each dev team organizes their tables into schemas. By coincidence, both dev teams do some tracking of accounting data, so each team happens to have a schema named 'accounting'. Using the same schema name is not a problem because the catalogs each have their own namespace so no collision.
Furthermore, each team eventually creates a table for accounting purposes named 'ledger'. Again, no naming collision.
You can think of this example as a hierarchy…
Postgres 9.2
cluster (installation)
warehouse
catalog (database)
inventory
schema
accounting
schema
ledger
tablesales
catalog (database)
selling
schema
accounting
schema (coincidental same name as above)
ledger
table (coincidental same name as above)Postgres 9.3
cluster
Each dev team's software makes a connection to the cluster. When doing so, they must specify which catalog (database) is theirs. Postgres requires that you connect to one catalog, but you are not limited to that catalog. That initial catalog is merely a default, used when your SQL statements omit the name of a catalog.
So if the dev team ever needs to access the other team's tables, they may do so if the database administrator has given them privileges to do so. Access is made with explicit naming in the pattern: catalog.schema.table. So if the 'warehouse' team needs to see the other team’s ('sales' team) ledger, they write SQL statements with sales.accounting.ledger
. To access their own ledger, they merely write accounting.ledger
. If they access both ledgers in the same piece of source code, they may choose to avoid confusion by including their own (optional) catalog name, warehouse.accounting.ledger
versus sales.accounting.ledger
.
By the way…
You may hear the word schema used in a more general sense, meaning the entire design of a particular database's table structure. By contrast, in the SQL Standard the word means specifically the particular layer in the Cluster > Catalog > Schema > Table
hierarchy.
Postgres uses both the word database as well as catalog in various places such as the CREATE DATABASE command.
Not all database system provides this full hierarchy of Cluster > Catalog > Schema > Table
. Some have only a single catalog (database). Some have no schema, just one set of tables. Postgres is an exceptionally powerful product.