Hello Friends,
In This post we will discuss about the postgress in details, As Process architecture , Background process etc. , So lets get started.
Origin :
PostgreSQL (often referred to as
Postgres) is an open source object-relational database management system with a
focus on extensibility and standards compliance. It is a highly used and tested
solution having started in 1981 as the Ingres project at the University of
California. The project was later named Postgres as it became "post
Ingres", and then in 1996 the online presence at the website
PostgreSQL.org was launched while the project was renamed PostgreSQL to reflect
its support for SQL.
The first PostgreSQL release was
known as version 6.0 on January 29, 1997 and since then PostgreSQL has
continued to be developed by the PostgreSQL Global Development Group, a diverse
group of companies and many thousands of individual contributors. As an open
source solution, it is free and released under the terms of the PostgreSQL
License, a permissive software license.
The PostgreSQL project continues to
make major releases (approximately annually) and minor bug fix releases, all
available under its free and open-source software PostgreSQL License. It can
handle workloads ranging from small single-machine applications to large
Internet-facing applications with many concurrent users. PostgreSQL runs on all
major operating systems including Linux, UNIX variants, and Windows, and is
fully ACID-compliant, with full support for foreign keys, joins, views,
triggers, and stored procedures (in multiple languages).
PostgreSQL is highly scalable both in
the sheer quantity of data it can manage and in the number of concurrent users
it can accommodate. There are active PostgreSQL systems in production
environments that manage in excess of 4 TB of data.
As an enterprise-class database,
PostgreSQL boasts features such as:
·
Multi-Version
Concurrency Control (MVCC)
·
point-in-time
recovery
·
tablespaces
·
asynchronous
replication
·
nested
transactions (savepoints)
·
online/hot
backups
·
sophisticated
query planner/optimizer, and
·
write
ahead logging for fault tolerance.
·
It
also supports international character sets, multibyte character encodings,
Unicode, and is locale-aware for sorting, case-sensitivity, and formatting.
PostgreSQL
Memory Components
* Postgres is a client server
architecture. Client and server can be on different or on same host. Mostly on
the different host. These hosts communicate through TCP/IP based protocol.
SHARED
MEMORY: When server gets
start, it occupies some of the memory from the RAM. Depending upon the various
parameters, x amount of memory is occupied from RAM. Whenever
we start the database or DB cluster by executing the pg_ctl utility with start
option, a postgres server process starts up. Then, it allocates a shared memory
area in memory, starts various background processes, start replication
associated processes and background worker processes if necessary, and waits
for connection requests for clients. This area is used by all processes of a
PostgreSQL server.
WAL buffers are like logbook where changes are
stored. If any UPDATE command occurs, the data is updated in the shared buffer
and a log entry is done in the WAL buffer which includes primary ID with old
and new data both. All the incremental changes are recorded in the WAL buffer. To
ensure that no data has been lost by server failures, PostgreSQL supports the
WAL mechanism. WAL data (also referred to as XLOG records) are transaction log
in PostgreSQL; and WAL buffer is a buffering area of the WAL data before
writing to a persistent storage.
CLOG Buffers: CLOG stands for "commit
log", and the CLOG buffers is an area in operating system RAM dedicated to
hold commit log pages. The commit log pages contain log of transaction metadata
and differ from the WAL data. The commit logs have commit status of all
transactions and indicate whether or not a transaction has been completed
(committed). There is no specific parameter to control this area of memory.
This is automatically managed by the database engine in tiny amounts. This is a
shared memory component, which is accessible to all the background server and
user processes of a PostgreSQL database. Commit Log(CLOG) keeps the states of
all transactions (eg: in_progress, committed, aborted) for Concurrency Control
(CC) mechanism.
Memory for Locks / Lock Space: This memory component is to store all
heavyweight locks used by the PostgreSQL instance. These locks are shared
across all the background server and user processes connecting to the database.
A non-default larger setting of two database parameters namely
max_locks_per_transaction and max_pred_locks_per_transaction in a way
influences the size of this memory component.
PROCESS MEMORY
/ MEMORY ARCHITECTURE: (Local
Memory Area OR Backend Process Area): This memory area is allocated by each backend process for its own use.
It is a temporary used privately by each postgres process. By default each
session will take a size of 4Mb. Eg.: If there are 100 sessions, then they will
consume 400 Mb.
Vacuum Buffers: This is the maximum amount of memory
used by each of the autovacuum worker processes, and it is controlled by the
autovacuum_work_mem database parameter. The memory is allocated from the
operating system RAM and is also influenced by the autovacuum_max_workers
database parameter. The setting of autovacuum_work_mem should be configured
carefully as autovacuum_max_workers times this memory will be allocated from
the RAM. All these parameter settings only come into play when the auto vacuum
daemon is enabled, otherwise, these settings have no effect on the behaviour of
VACUUM when run in other contexts. This memory component is not shared by any
other background server or user process.
Work Memory: This is the amount of memory reserved
for either a single sort or hash table operation in a query and it is
controlled by work_mem database parameter. A sort operation could be one of an
ORDER BY, DISTINCT or Merge join, and a hash table operation could be due to a
hash-join, hash based aggregation or an IN subquery. A single complex query may
have many numbers of such sort or hash table operations, and as many chunks of
memory allocations defined by the work_mem parameter will be created for each
of those operations in a user connection. It is for this reason, that work_mem
should not be declared to a very big value as it might lead to aggressively
utilizing all the available memory from operating system for a considerably
huge query, thereby starving the operating system of RAM which might be needed
for other processes.
Executor uses this area for sorting
tuples by ORDER BY and DISTINCT operations, and for joining tables by
merge-join and hash-join operations.
Maintenance Work Memory: This is the maximum amount of memory
allocation of RAM consumed for maintenance operations. A maintenance operation
could be one of the VACUUM, CREATE INDEX or adding a FOREIGN KEY to a table.
The setting is controlled by the maintenance_work_mem database parameter. A
database session could only execute any of the above-mentioned maintenance
operations at a time and a PostgreSQL instance does not normally execute many
such maintenance operations concurrently. Hence, this parameter can be set
significantly larger than work_mem parameter. A point of caution is to not set
this memory to a very high value, which will allocate as many portions of
memory allocations as defined by the autovacuum_max_workers parameter in the
event of not configuring the autovacuum_work_mem parameter. Maintenance
operations like VACUUM, REINDEX uses this area.
Temp Buffers: A database may have one or more
temporary tables, and the data blocks (pages) of such temporary tables need a
separate allocation of memory to be processed in. The temp buffers serve this
purpose by utilizing a portion of RAM, defined by the temp_buffers parameter.
The temp buffers are only used for access to temporary tables in a user
session. There is no relation between temp buffers in memory and the temporary
files that are created under the pgsql_tmp directory during large sort and hash
table operations. Executor uses this area for sorting temporary tables.
BACKGROUND
PROCESS: Those processes which
get started when postgres server starts. Rest other processes are different &
mandatory processes.
Postmaster: Postmaster is the first process which get starts when postgres cluster
starts. The postmaster process acts as a supervisor.
Postmaster act as a listener at a server side, any new connection coming in, it
will first connect to the postmaster and every time for every connection
postmaster creates "postgres" process. Postmaster runs on the default
port no. 5432 and we can change or reconfigure the port no. anytime and restart
the cluster. There is one postmaster for one cluster. PostgreSQL Server process
or Postmaster is a parent of all in a PostgreSQL Server.
Backend/Postgres/UserEnd Process: When a new connection request comes, postmaster
will create a new process called 'postgres' once your connection is authorized
and authenticated. So, for every new client connection there will be new
postgres process created and the client connection will directly interact with
the new postgres process from onwards.
A backend process, which is also called
postgres, is started by the postmaster for each client request and handles all
queries issued by the one connected client.
It communicates with the client by a single TCP
connection, using semaphores and shared memory and terminates when the client
gets disconnected.
As it allowed to operate only one database, we must
specify a database we want to use explicitly authenticated with username and
password when connecting to a PostgreSQL server.
PostgreSQL allows multiple clients to connect
simultaneously; the configuration parameter max_connection controls the maximum
number of the clients (default is 100).
Vacuum and Autovacuum: Whenever we run an update command, system do
not change the actual data, but it inserts a new data in reference to the old
data and mark the row as deleted. Similarly, when we run a delete command, it
will not delete the actual data, a new row is inserted in reference to that
data and that data is marked as deleted. By default, autovacuum is run in every
few minutes and clean all the delete marked rows from the database.
VACUUM reclaims storage occupied by dead tuples
(deleted or updated rows). Analyse will collect stats for each table. VACUUM
[({FULL | FREEZE | VERBOSE | ANALYZE} [, ...])] [table [(column [, ...])]]
Plain Vacuum (without FULL) simply reclaim
space and can operate in parallel with normal reading and writing of the table.
Vacuum Full locks the entire table rewrites the
entire contents of the table into a new file with no space, allows unused space
to be returned to the OS.
Vacuum processing has been automated with the
autovacuum daemon. The daemon periodically invokes several autovacuum_worker
processes. By default, it wakes every 1 min (defined by autovacuum_naptime),
and invokes three workers (defined by autovacuum_max_works)
Vacuum freeze marks a table's header with a
very special timestamp that tells postgres that it does not need to be
vacuumed, till next update.
Autovacuum Launcher: The autovacuum-worker processes are invoked
for vacuum process periodically (it request to create the autovacuum workers to
the postgres server).
Stats collector: It collects the statistics of the activities
which are happening in the DB. Database should know where is the table present,
table size, table type, cardinality of table, where are indexes of table all
such kind of metadata is collected by stats collector. This stats collector
process collects the data and save into the statistic tables or statistics
dictionary table in the data file itself. What sessions are coming to the
database, how many users logged in to the database all of these are the stats
of DB. This stats collection is carried
out after some minutes or hours. We can configure the time of stats collector
accordingly. We can automate or manualize the stat collection process.
Log/Background Writer: In this process, dirty pages on the shared
buffer pool are written to a persistent storage (eg: HDD, SSD) on a regular
basis gradually.
Wal Writer: It writes from the Wal_buffers to the actual Wal_files. This process
writes and flushes periodically the WAL data on the WAL buffer to persistent
storage.
Logging Collector (logger)/Logger Process: writes into the logs file. Basically, all the
errors logs are recorded by this and written into the log files.
Checkpointer Process: It is responsible to triggering the
checkpoints in postgres. Checkpoint process is a trigger for write process to
write dirty pages to disk.
Replication Process: It is launched when we are setting up any
replication of the database server.
bgworker (logical replication): Used for replication.
Statistics collector: In this process, statistics information such
as for pg_stat_activity and for pg_stat_database, etc is collected.
Archiver: In this process archiving logging is executed. It copy WAL files to
archive WAL.
Logical
Structure: A cluster can have
'n' no. of databases. every database will contain some schemas, and each schema
will contain some tables, indexes, views, functions etc. By default, schema is named
as 'public'. It also contains cluster level tablespaces. Different databases
can share the same tablespaces. Users are also created at cluster level. So,
different users can access different databases in a cluster. So, users are not
specific to database, users can login into any databases, they have login privileges
on all the databases by default.
|->Database
|->Schema
|->Tables/Indexes/Views/Functions
|->tablespaces
|->users
PHYSICAL FILES:
Data Files: It contains all the data required by the user for processing. Data from
data files are first moved to shared buffers and then the data is processed
accordingly.
WAL Files/Segments: WAL Buffers are linked with WAL
Files/Segments. WAL buffer flushes all the data into WAL Segment whenever
commit command occurs. Commit is a command which ensures the end of the
transaction. Commit work by default after every command in PostgreSQL which can
also be changed accordingly.
Archived WAL/Files: This file segment is a kind of copy of WAL
Segment which stores all the archived data of WAL Segment which can be stored
for a long period. WAL segment is having limited storage capacity. Once the
space is exhausted in WAL Segment, the system starts replacing the old data
with new data which results in the loss of old data. Eg: We are having complete
backup of a X database on every Sunday. If the X database get crash on
Wednesday, we can restore the data with Sunday backup and with the help of
archive restore procedures, we can re-apply all the changes happened in the
database from Sunday to Wednesday.
Archives are required till full backup of the
database. After that we can even delete the archive in order to gain some space
in the DB.
Error Log Files: Those files which contains all the error
messages, warning messages, informational messages, or messages belonging to
all the major thing happening to the database. All logs related to DDL and DML
changes are not stored in this space. Internal errors, internal bugs entry is
stored in this file which help the admin to troubleshoot the problem.
ConversionConversion EmoticonEmoticon