| 1 | |
| 2 | === Postgres 10 Logical Replication === |
| 3 | |
| 4 | These are my notes on getting logical replication going in postgres |
| 5 | between a "master" and a "slave" node. Both systems should have port 5432 |
| 6 | (the default postgres port) open to each other. On CentOS 7, this |
| 7 | required the following commands as root : |
| 8 | |
| 9 | {{{ |
| 10 | # firewall-cmd --zone=public --add-port=5432/tcp --permanent |
| 11 | # firewall-cmd --reload |
| 12 | }}} |
| 13 | |
| 14 | And do this to be sure port 5432 is listed as being open : |
| 15 | {{{ |
| 16 | # firewall-cmd --list-ports |
| 17 | }}} |
| 18 | |
| 19 | There may be other firewall issues to address depending on your setup. |
| 20 | |
| 21 | === PART 1 : THINGS TO DO ON BOTH NODES === |
| 22 | |
| 23 | Install PG 10 or greater, so as root : |
| 24 | |
| 25 | {{{ |
| 26 | # yum install postgresql10-server postgresql10-contrib postgresql10-devel |
| 27 | }}} |
| 28 | |
| 29 | Strictly speaking you don't need postgresql10-devel, but I use it for other stuff so |
| 30 | I grab it. |
| 31 | |
| 32 | Then init the database and make postgres run by default : |
| 33 | |
| 34 | {{{ |
| 35 | # /usr/pgsql-10/bin/postgresql-10-setup initdb |
| 36 | # systemctl start postgresql-10 |
| 37 | # systemctl enable postgresql-10 |
| 38 | }}} |
| 39 | |
| 40 | Users using postgres 10 would be wise to add the PG10 binary directory |
| 41 | to the start of their path, so add this to $HOME/.bashrc : |
| 42 | |
| 43 | {{{ |
| 44 | PATH=/usr/pgsql-10/bin:$PATH; export PATH |
| 45 | }}} |
| 46 | |
| 47 | You should then be able to become the linux user 'postgres' : |
| 48 | |
| 49 | {{{ |
| 50 | # su - postgres |
| 51 | }}} |
| 52 | |
| 53 | And run 'psql', set the superuser password and exit the database : |
| 54 | |
| 55 | {{{ |
| 56 | postgres=# ALTER USER postgres WITH PASSWORD 'bigBadTruck'; |
| 57 | postgres=# \q |
| 58 | }}} |
| 59 | |
| 60 | Then as 'postgres' edit the file $HOME/10/data/postgresql.conf and change the line : |
| 61 | {{{ |
| 62 | #wal_level = replica |
| 63 | }}} |
| 64 | To : |
| 65 | {{{ |
| 66 | wal_level = logical |
| 67 | }}} |
| 68 | |
| 69 | Still as postgres, edit the pg_hba.conf file. It will be named something like |
| 70 | 10/data/pg_hba.conf. If the IP addresses of the two machines are 100.37.24.123 |
| 71 | and 100.37.24.122, I set it up to look like this on both machines : |
| 72 | |
| 73 | {{{ |
| 74 | # TYPE DATABASE USER ADDRESS METHOD |
| 75 | |
| 76 | # "local" is for Unix domain socket connections only |
| 77 | local all all md5 |
| 78 | local all all peer |
| 79 | # IPv4 local connections: |
| 80 | host all all 127.0.0.1/32 ident |
| 81 | host all all 127.0.0.1/32 md5 |
| 82 | host all all localhost md5 |
| 83 | host all all localhost ident |
| 84 | host bladerunner deckard 100.37.24.123/32 md5 |
| 85 | host bladerunner deckard 100.37.24.122/32 md5 |
| 86 | host bladerunner postgres 100.37.24.123/32 md5 |
| 87 | host bladerunner postgres 100.37.24.122/32 md5 |
| 88 | # IPv6 local connections: |
| 89 | host all all ::1/128 ident |
| 90 | # Allow replication connections from localhost, by a user with the |
| 91 | # replication privilege. |
| 92 | local replication all peer |
| 93 | hostnossl replication all 100.37.24.122/32 md5 |
| 94 | hostnossl replication all 100.37.24.123/32 md5 |
| 95 | host replication all 127.0.0.1/32 ident |
| 96 | host replication all ::1/128 ident |
| 97 | }}} |
| 98 | |
| 99 | 'deckard' and 'bladerunner' are a user and a database we'll set up in a bit. |
| 100 | |
| 101 | That is almost certainly overkill as it allows full communication |
| 102 | both ways, but that may be good for testing here. It is probably worth |
| 103 | looking into using 'host' rather than 'hostnossl', too, which I *think* needs |
| 104 | encryption. |
| 105 | |
| 106 | And then restart the database with : |
| 107 | {{{ |
| 108 | /usr/pgsql-10/bin/pg_ctl restart -D /var/lib/pgsql/10/data |
| 109 | }}} |
| 110 | |
| 111 | There will likely be more database restarts required, it may be worth putting that restart |
| 112 | command in a script in the home directory for user postgres. |
| 113 | |
| 114 | You should then be able to revert to being logged in as whoever you are (the default |
| 115 | for me being 'noien') and edit the file $HOME/.pgpass to look like : |
| 116 | |
| 117 | {{{ |
| 118 | $ cat $HOME/.pgpass |
| 119 | localhost:*:*:postgres:bigBadTruck |
| 120 | }}} |
| 121 | |
| 122 | Set protections on that file appropriately : |
| 123 | {{{ |
| 124 | $ chmod 600 $HOME/.pgpass |
| 125 | }}} |
| 126 | |
| 127 | You should then be able, as yourself, to access the database with : |
| 128 | {{{ |
| 129 | $ psql -Upostgres |
| 130 | }}} |
| 131 | Check that you can, and once at the database prompt, create a user with replication : |
| 132 | |
| 133 | {{{ |
| 134 | CREATE USER deckard WITH PASSWORD 'origami'; |
| 135 | CREATE DATABASE bladerunner; |
| 136 | GRANT ALL PRIVILEGES ON DATABASE bladerunner TO deckard; |
| 137 | ALTER USER deckard WITH REPLICATION; |
| 138 | }}} |
| 139 | |
| 140 | Exit the database prompt and add this user to $HOME/.pgpass so that the file looks like : |
| 141 | |
| 142 | {{{ |
| 143 | $ cat $HOME/.pgpass |
| 144 | localhost:*:*:postgres:bigBadTruck |
| 145 | localhost:*:*:deckard:origami |
| 146 | }}} |
| 147 | |
| 148 | You should then be able to get to the database prompt as user deckard using database bladerunner : |
| 149 | {{{ |
| 150 | $ psql -Udeckard bladerunner |
| 151 | }}} |
| 152 | |
| 153 | At that prompt, create a simple table : |
| 154 | |
| 155 | {{{ |
| 156 | DROP TABLE IF EXISTS expireTimes; |
| 157 | CREATE TABLE expireTimes ( retirement VARCHAR(255) ); |
| 158 | }}} |
| 159 | |
| 160 | You need to do this on both master and slave nodes, replication will not make the |
| 161 | table for you on the slave node. |
| 162 | |
| 163 | Then check connectivity between the two machines. On both machines, add the |
| 164 | users for the database on the other machine to the $HOME/.pgpass file, like so : |
| 165 | |
| 166 | {{{ |
| 167 | $ cat $HOME/.pgpass |
| 168 | localhost:*:*:postgres:bigBadTruck |
| 169 | localhost:*:*:deckard:origami |
| 170 | the.other.machine.name:5432:bladerunner:deckard:origami |
| 171 | the.other.machine.name:5432:*:postgres:bigBadTruck |
| 172 | }}} |
| 173 | |
| 174 | You should then, on both machines, be able to access the other machine both as the |
| 175 | postgres user and the deckard user, using the bladerunner database, like so : |
| 176 | |
| 177 | {{{ |
| 178 | psql -h the.other.machine.name -U postgres bladerunner |
| 179 | psql -h the.other.machine.name -U deckard bladerunner |
| 180 | }}} |
| 181 | |
| 182 | Making sure those commands work at the linux prompt ensures that the connectivity |
| 183 | between the databases on the two machines is OK. |
| 184 | |
| 185 | So, you now have postgres 10 running on two systems. Both of them have a user named |
| 186 | deckard, with a database named bladerunner, and in that database is an empty table named |
| 187 | expireTimes. Connectivity between the two machines is OK. |
| 188 | |
| 189 | |
| 190 | |
| 191 | === PART 2 : SETUP ON THE MASTER NODE === |
| 192 | |
| 193 | In order to get some data flowing through the expireTimes table on the master node, write |
| 194 | a script called "doInsert.sh" that looks like this : |
| 195 | |
| 196 | {{{ |
| 197 | #!/bin/bash |
| 198 | t=`date +"%Y/%m/%d %H:%M:%S %B %A %d"` |
| 199 | echo INSERT INTO expireTimes VALUES \(\'$t\'\)\; | psql -Udeckard bladerunner |
| 200 | exit 0 |
| 201 | }}} |
| 202 | |
| 203 | Then set executable status on that file : |
| 204 | {{{ |
| 205 | $ chmod +x doInsert.sh |
| 206 | }}} |
| 207 | |
| 208 | And similarly write a script called "doDelete.sh" that looks like : |
| 209 | |
| 210 | {{{ |
| 211 | #!/bin/bash |
| 212 | sleep 30; |
| 213 | t=`date --date="1 hour ago" +"%Y/%m/%d %H:%M:%S %B %A %d"` |
| 214 | echo DELETE FROM expireTimes WHERE retirement \< \'$t\'\; | psql -Udeckard bladerunner |
| 215 | exit 0 |
| 216 | }}} |
| 217 | |
| 218 | And again set executable status on the file. The first script will insert the current |
| 219 | time (in a somewhat screwball format) into the database, the second script will delete |
| 220 | times that are older than an hour. |
| 221 | |
| 222 | Run them both in cron, running the insert one every minute and the delete one every |
| 223 | ten minutes, like so : |
| 224 | |
| 225 | {{{ |
| 226 | * * * * * /home/noien/replication/setup/doInsert.sh &> /home/noien/replication/setup/doInsert.log |
| 227 | */10 * * * * /home/noien/replication/setup/doDelete.sh &> /home/noien/replication/setup/doDelete.log |
| 228 | }}} |
| 229 | |
| 230 | Check that the times are going into the table : |
| 231 | |
| 232 | {{{ |
| 233 | $ psql -Udeckard bladerunner |
| 234 | psql (10.3) |
| 235 | Type "help" for help. |
| 236 | |
| 237 | bladerunner=> select retirement from expireTimes order by retirement; |
| 238 | retirement |
| 239 | ------------------------------------- |
| 240 | 2018/04/13 14:51:01 April Friday 13 |
| 241 | 2018/04/13 14:52:01 April Friday 13 |
| 242 | 2018/04/13 14:53:01 April Friday 13 |
| 243 | 2018/04/13 14:54:01 April Friday 13 |
| 244 | 2018/04/13 14:55:01 April Friday 13 |
| 245 | }}} |
| 246 | |
| 247 | After a while, there should always be between 60 and 70 entries in the table : |
| 248 | |
| 249 | {{{ |
| 250 | bladerunner=> select count(*) from expireTimes; |
| 251 | count |
| 252 | ------- |
| 253 | 60 |
| 254 | }}} |
| 255 | |
| 256 | So, this is the little test table that we are going to replicate on the slave node. |
| 257 | We have to "publish" this table on the master node.a Still in the database : |
| 258 | |
| 259 | {{{ bladerunner=> CREATE PUBLICATION harrison FOR TABLE expireTimes; }}} |
| 260 | |
| 261 | |
| 262 | Interestingly, after publishing the delete script will not be able to delete entries |
| 263 | from the table. Attempts to delete will result in an error : |
| 264 | |
| 265 | {{{ |
| 266 | ERROR: cannot delete from table "expiretimes" because it does not have a replica identity and publishes deletes |
| 267 | HINT: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE. |
| 268 | }}} |
| 269 | |
| 270 | So to follow that hint, do the following : |
| 271 | |
| 272 | {{{ |
| 273 | bladerunner=> ALTER TABLE expireTimes REPLICA IDENTITY FULL; |
| 274 | }}} |
| 275 | |
| 276 | After which deletes can happen again. I'm not sure that FULL is the optimal setting |
| 277 | for all cases, more reading may be required. |
| 278 | |
| 279 | We should now be able to see the publication on the master node : |
| 280 | |
| 281 | {{{ |
| 282 | bladerunner=> \dRp+ |
| 283 | Publication harrison |
| 284 | Owner | All tables | Inserts | Updates | Deletes |
| 285 | ---------+------------+---------+---------+--------- |
| 286 | deckard | f | t | t | t |
| 287 | Tables: |
| 288 | "public.expiretimes" |
| 289 | }}} |
| 290 | |
| 291 | |
| 292 | === PART 3 : SETUP ON THE SLAVE NODE === |
| 293 | |
| 294 | |
| 295 | Subscriptions must be done by the database super user, postgres. So log in to the slave |
| 296 | machine and go to the bladerunner database as postgres with : |
| 297 | |
| 298 | {{{ |
| 299 | $ psql -Upostgres bladerunner |
| 300 | }}} |
| 301 | |
| 302 | Then create a subscription that connects to the publication we have on the master machine : |
| 303 | |
| 304 | {{{ |
| 305 | bladerunner=# CREATE SUBSCRIPTION ford CONNECTION 'host=the.master.machine.name dbname=bladerunner user=deckard password=origami' PUBLICATION harrison; |
| 306 | }}} |
| 307 | |
| 308 | Note that you do need to specify the password, the .pgpass file does not affect the CREATE SUBSCRIPTION command. |
| 309 | I'm not sure what would happen if the password on the master node were to change. |
| 310 | |
| 311 | You should then be able to see the subscription : |
| 312 | |
| 313 | {{{ |
| 314 | bladerunner=# \dRs+ |
| 315 | List of subscriptions |
| 316 | Name | Owner | Enabled | Publication | Synchronous commit | Conninfo |
| 317 | ------+----------+---------+-------------+--------------------+--------------------------------------------------------------------------- |
| 318 | ford | postgres | t | {harrison} | off | host=the.master.machine.name dbname=bladerunner user=deckard password=origami |
| 319 | (1 row) |
| 320 | }}} |
| 321 | |
| 322 | And monitor it in more depth with the pg_stat_subscription table : |
| 323 | |
| 324 | {{{ |
| 325 | bladerunner=# SELECT * FROM pg_stat_subscription; |
| 326 | subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time |
| 327 | -------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+------------------------------- |
| 328 | 16413 | ford | 22102 | | 0/199F168 | 2018-04-13 16:57:05.488469-06 | 2018-04-13 16:56:43.719121-06 | 0/199F168 | 2018-04-13 16:57:05.488469-06 |
| 329 | }}} |
| 330 | |
| 331 | Similarly on the master machine you should be able to access the database (this time there's no |
| 332 | need to be the postgres user) with : |
| 333 | |
| 334 | {{{ |
| 335 | $ psql -Udeckard bladerunner |
| 336 | }}} |
| 337 | |
| 338 | And then monitor the other end by looking at the pg_stat_replication table : |
| 339 | |
| 340 | {{{ |
| 341 | bladerunner=> select pid, usesysid, usename, application_name, client_addr, client_hostname, client_port, backend_start from pg_stat_replication; |
| 342 | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start |
| 343 | -------+----------+---------+------------------+---------------+--------------------------+-------------+------------------------------- |
| 344 | 31018 | 16384 | deckard | ford | 100.37.24.123 | the.slave.machine.name | 32826 | 2018-04-13 16:38:52.349168-06 |
| 345 | }}} |
| 346 | |
| 347 | And the two tables on the two machines should then be mirror images of each other, with updates on the master |
| 348 | popping up on the slave (also sometimes called the 'standby') machine. |
| 349 | |
| 350 | NOTE : If a new table is added to the publication on the master, the slave subscription will not see them |
| 351 | by default. In that case you would have to do this on the slave node : |
| 352 | |
| 353 | {{{ |
| 354 | bladerunner=# ALTER SUBSCRIPTION ford REFRESH PUBLICATION; |
| 355 | }}} |
| 356 | |
| 357 | Niles Oien April 2018. |