I am very new to databases, I haven't worked lot on it. Now I want to understand the term database clusters. I googled a lot and found many useful links but I am not able to understand them - maybe because I have very little basic knowledge about databases and also they were in very techy language.
I need advice on these points:
A PostgreSQL database "cluster" is a postmaster and a group of subsiduary processes, all managing a shared data directory that contains one or more databases.
The term "cluster" in PostgreSQL is a historical quirk*, and is completely different to the general meaning of "compute cluster", which normally refers to groups of computers that work together to achieve higher performance and/or availability. It is also un-related to the PostgreSQL command CLUSTER
, which is about organizing tables.
If you're reading this you might actually be looking for information on high availability, replication or pooling, in which case you should read the Replication, Clustering and High Availability wiki article and the high availability section of the PostgreSQL manual, then look into tools like repmgr.
A cluster is normally created for you when you install PostgreSQL; the installation will usually initdb
a new cluster for you. It is quite unusual for a basic or intermediate user to ever need to create clusters or manage multiple clusters, so it would help if you explained why you want to do this, and what the underlying problem you are trying to solve is. The user manual could probably explain this better, since it assumes you're installing PostgreSQL from source and relatively few people actually do that.
Each cluster's data directory is created with initdb
and managed with a postmaster that's started via a system service (Windows service, launchd
, init
, upstart
, systemd
, etc depending on operating system and version) or directly via pg_ctl
.
The cluster has built-in databases template0
, template1
and postgres
; other databases are created by the user.
The postmaster for a cluster accepts incoming connections by listening on a tcp port, and hands those off to worker backends. Only one postmaster may run on a given port, so each cluster must have a different port.
I wrote more about PostgreSQL's structure in this previous answer. See the sub-heading "Relations? Schema? Huh?".
How to "create" clusters in Pg depends entirely on how you are running it. Since you're asking, I suspect you're on an Ubuntu system that uses pg_wrapper
, in which case you'd use the pg_wrapper
commands like pg_createcluster
.
* The confusion between a "cluster" in PostgreSQL terminology and the common usage of the term "cluster" is a confusing and regrettable historical oddity, especially when discussing clustering of PostgreSQL instances. You can have a cluster of PostgreSQL clusters, which is just painful.