wiki:replicating

Version 3 (modified by niles, 3 months ago) (diff)

--

Postgres 10 Logical Replication

See also https://blog.2ndquadrant.com/logical-replication-postgresql-10/ although I'm not sure how long that will hang around, the internet being what it is.

These are my notes on getting logical replication going in postgres between a "master" and a "slave" node. Both systems should have port 5432 (the default postgres port) open to each other. On CentOS 7, this required the following commands as root :

# firewall-cmd --zone=public --add-port=5432/tcp --permanent
# firewall-cmd --reload

And do this to be sure port 5432 is listed as being open :

# firewall-cmd --list-ports

There may be other firewall issues to address depending on your setup.

PART 1 : THINGS TO DO ON BOTH NODES

Install PG 10 or greater, so as root :

# yum install postgresql10-server postgresql10-contrib postgresql10-devel

Strictly speaking you don't need postgresql10-devel, but I use it for other stuff so I grab it.

Then init the database and make postgres run by default :

# /usr/pgsql-10/bin/postgresql-10-setup initdb
# systemctl start postgresql-10
# systemctl enable postgresql-10

Users using postgres 10 would be wise to add the PG10 binary directory to the start of their path, so add this to $HOME/.bashrc :

PATH=/usr/pgsql-10/bin:$PATH; export PATH

You should then be able to become the linux user 'postgres' :

# su - postgres

And run 'psql', set the superuser password and exit the database :

postgres=# ALTER USER postgres WITH PASSWORD 'bigBadTruck';
postgres=# \q

Then as 'postgres' edit the file $HOME/10/data/postgresql.conf and change the line :

#wal_level = replica

To :

wal_level = logical

Still as postgres, edit the pg_hba.conf file. It will be named something like 10/data/pg_hba.conf. If the IP addresses of the two machines are 100.37.24.123 and 100.37.24.122, I set it up to look like this on both machines :

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     md5
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
host    all             all             127.0.0.1/32            md5
host    all             all             localhost               md5
host    all             all             localhost               ident
host    bladerunner     deckard         100.37.24.123/32         md5
host    bladerunner     deckard         100.37.24.122/32         md5
host    bladerunner     postgres        100.37.24.123/32         md5
host    bladerunner     postgres        100.37.24.122/32         md5
# IPv6 local connections:
host    all             all             ::1/128                 ident
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
hostnossl    replication     all             100.37.24.122/32         md5
hostnossl    replication     all             100.37.24.123/32         md5
host    replication     all             127.0.0.1/32            ident
host    replication     all             ::1/128                 ident

'deckard' and 'bladerunner' are a user and a database we'll set up in a bit.

That is almost certainly overkill as it allows full communication both ways, but that may be good for testing here. It is probably worth looking into using 'host' rather than 'hostnossl', too, which I *think* needs encryption.

And then restart the database with :

/usr/pgsql-10/bin/pg_ctl restart -D /var/lib/pgsql/10/data

There will likely be more database restarts required, it may be worth putting that restart command in a script in the home directory for user postgres.

You should then be able to revert to being logged in as whoever you are (the default for me being 'noien') and edit the file $HOME/.pgpass to look like :

$ cat $HOME/.pgpass
localhost:*:*:postgres:bigBadTruck

Set protections on that file appropriately :

$ chmod 600 $HOME/.pgpass

You should then be able, as yourself, to access the database with :

$ psql -Upostgres

Check that you can, and once at the database prompt, create a user with replication :

CREATE USER deckard WITH PASSWORD 'origami';
CREATE DATABASE bladerunner;
GRANT ALL PRIVILEGES ON DATABASE bladerunner TO deckard;
ALTER USER deckard WITH REPLICATION;

Exit the database prompt and add this user to $HOME/.pgpass so that the file looks like :

$ cat $HOME/.pgpass
localhost:*:*:postgres:bigBadTruck
localhost:*:*:deckard:origami

You should then be able to get to the database prompt as user deckard using database bladerunner :

$ psql -Udeckard bladerunner 

At that prompt, create a simple table :

DROP TABLE IF EXISTS expireTimes;
CREATE TABLE expireTimes ( retirement VARCHAR(255) );

You need to do this on both master and slave nodes, replication will not make the table for you on the slave node.

Then check connectivity between the two machines. On both machines, add the users for the database on the other machine to the $HOME/.pgpass file, like so :

$ cat $HOME/.pgpass
localhost:*:*:postgres:bigBadTruck
localhost:*:*:deckard:origami
the.other.machine.name:5432:bladerunner:deckard:origami
the.other.machine.name:5432:*:postgres:bigBadTruck

You should then, on both machines, be able to access the other machine both as the postgres user and the deckard user, using the bladerunner database, like so :

psql -h the.other.machine.name -U postgres bladerunner
psql -h the.other.machine.name -U deckard bladerunner

Making sure those commands work at the linux prompt ensures that the connectivity between the databases on the two machines is OK.

So, you now have postgres 10 running on two systems. Both of them have a user named deckard, with a database named bladerunner, and in that database is an empty table named expireTimes. Connectivity between the two machines is OK.

PART 2 : SETUP ON THE MASTER NODE

In order to get some data flowing through the expireTimes table on the master node, write a script called "doInsert.sh" that looks like this :

#!/bin/bash
t=`date +"%Y/%m/%d %H:%M:%S %B %A %d"`
echo INSERT INTO expireTimes VALUES \(\'$t\'\)\; | psql -Udeckard bladerunner
exit 0

Then set executable status on that file :

$ chmod +x doInsert.sh 

And similarly write a script called "doDelete.sh" that looks like :

#!/bin/bash
sleep 30;
t=`date --date="1 hour ago" +"%Y/%m/%d %H:%M:%S %B %A %d"`
echo DELETE FROM expireTimes WHERE retirement \< \'$t\'\; | psql -Udeckard bladerunner
exit 0

And again set executable status on the file. The first script will insert the current time (in a somewhat screwball format) into the database, the second script will delete times that are older than an hour.

Run them both in cron, running the insert one every minute and the delete one every ten minutes, like so :

* * * * * /home/noien/replication/setup/doInsert.sh &> /home/noien/replication/setup/doInsert.log
*/10 * * * * /home/noien/replication/setup/doDelete.sh &> /home/noien/replication/setup/doDelete.log

Check that the times are going into the table :

$ psql -Udeckard bladerunner
psql (10.3)
Type "help" for help.

bladerunner=> select retirement from expireTimes order by retirement;
             retirement              
-------------------------------------
 2018/04/13 14:51:01 April Friday 13
 2018/04/13 14:52:01 April Friday 13
 2018/04/13 14:53:01 April Friday 13
 2018/04/13 14:54:01 April Friday 13
 2018/04/13 14:55:01 April Friday 13

After a while, there should always be between 60 and 70 entries in the table :

bladerunner=> select count(*) from expireTimes;
 count 
-------
    60

So, this is the little test table that we are going to replicate on the slave node. We have to "publish" this table on the master node.a Still in the database :

bladerunner=> CREATE PUBLICATION harrison FOR TABLE expireTimes; 

Interestingly, after publishing the delete script will not be able to delete entries from the table. Attempts to delete will result in an error :

ERROR:  cannot delete from table "expiretimes" because it does not have a replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.

So to follow that hint, do the following :

bladerunner=> ALTER TABLE expireTimes REPLICA IDENTITY FULL; 

After which deletes can happen again. I'm not sure that FULL is the optimal setting for all cases, more reading may be required.

We should now be able to see the publication on the master node :

bladerunner=> \dRp+
                Publication harrison
  Owner  | All tables | Inserts | Updates | Deletes 
---------+------------+---------+---------+---------
 deckard | f          | t       | t       | t
Tables:
    "public.expiretimes"

PART 3 : SETUP ON THE SLAVE NODE

Subscriptions must be done by the database super user, postgres. So log in to the slave machine and go to the bladerunner database as postgres with :

 $ psql -Upostgres bladerunner 

Then create a subscription that connects to the publication we have on the master machine :

 bladerunner=# CREATE SUBSCRIPTION ford CONNECTION 'host=the.master.machine.name dbname=bladerunner user=deckard password=origami' PUBLICATION harrison; 

Note that you do need to specify the password, the .pgpass file does not affect the CREATE SUBSCRIPTION command. I'm not sure what would happen if the password on the master node were to change.

You should then be able to see the subscription :

bladerunner=# \dRs+
                                                          List of subscriptions
 Name |  Owner   | Enabled | Publication | Synchronous commit |                                 Conninfo                                  
------+----------+---------+-------------+--------------------+---------------------------------------------------------------------------
 ford | postgres | t       | {harrison}  | off                | host=the.master.machine.name dbname=bladerunner user=deckard password=origami
(1 row)

And monitor it in more depth with the pg_stat_subscription table :

bladerunner=# SELECT * FROM pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time        
-------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 16413 | ford    | 22102 |       | 0/199F168    | 2018-04-13 16:57:05.488469-06 | 2018-04-13 16:56:43.719121-06 | 0/199F168      | 2018-04-13 16:57:05.488469-06

Similarly on the master machine you should be able to access the database (this time there's no need to be the postgres user) with :

 $ psql -Udeckard bladerunner 

And then monitor the other end by looking at the pg_stat_replication table :

bladerunner=> select pid, usesysid, usename, application_name, client_addr, client_hostname, client_port, backend_start from pg_stat_replication;
  pid  | usesysid | usename | application_name | client_addr   |     client_hostname      | client_port |         backend_start         
-------+----------+---------+------------------+---------------+--------------------------+-------------+-------------------------------
 31018 |    16384 | deckard | ford             | 100.37.24.123 | the.slave.machine.name   |       32826 | 2018-04-13 16:38:52.349168-06

And the two tables on the two machines should then be mirror images of each other, with updates on the master popping up on the slave (also sometimes called the 'standby') machine.

NOTE : If a new table is added to the publication on the master, the slave subscription will not see them by default. In that case you would have to do this on the slave node :

 bladerunner=# ALTER SUBSCRIPTION ford REFRESH PUBLICATION; 

Niles Oien April 2018.