Upgrading a live PostgreSQL database in a Kubernetes environment can be a daunting task. Earlier this year, I faced this exact challenge: migrating an important PostgreSQL 15 database to version 16. As this was the first time I tackled an upgrade of this nature, I realized I would gain more from the experience by documenting it. Additionally, I hope to share insights that might help others navigate similar upgrades.
Fortunately, I collaborated with a knowledgeable colleague who was already well-versed in both PostgreSQL database maintenance and Kubernetes. His support was invaluable—thanks Robin!
Now, the plan was simple:
- Take a dump of the old database from the Kubernetes pod.
- Perform the upgrade locally.
- Upload the upgraded files to the Kubernetes pod.
- Update our deployment scripts to run with the upgraded database files.
Preliminary Work
To do this job, I needed both PostgreSQL 15, PostgreSQL 16, and kubectl
installed on my system.
Thus, having a tool like mise
really comes in handy, as it makes it very easy to retrieve tools on demand1.
For instance, to do some very quick initial testing, I did:
mise shell postgres@15.5
initdb -D tmp
pg_ctl start -D tmp
pg_ctl stop -D tmp
First try
Spoiler alert: I needed two tries.
Dump of the old database
I first tried performing a pg_basebackup
against the old PostgreSQL server via port forwarding:
# Port forward between local 5432 and remote
kubectl port-forward service/my-db 5432
# Test that it works
psql --host localhost --port 5432 --user postgres postgres
# Take backup
pg_basebackup --host localhost --port 5432 --user postgres -D dump -Fp -Xs -P --checkpoint fast
As the docs explain, pg_basebackup
is used to take a base backup of a running PostgreSQL database cluster.
The backup is taken without affecting other clients of the database.
It really feels like the natural tool here.
However, the process failed with a “connection reset”-type error.
It must be some issue with networking or buffering – it’s a bit unclear.
I don’t remember the specifics of this failure, since I figured I could just copy the files manually.
So, I next tried to copy the old database files with kubectl
:
kubectl cp my-service-db/my-db-pg15-0:/var/lib/postgresql/data/pgdata -c postgres db-old
I tested that I could start a server with the dumped files locally:
mise shell postgres@15.5
pg_ctl start -D db-old # failed first time!
rm db-old/postmaster.pid
pg_ctl start -D db-old # now it worked
pg_ctl stop -D db-old
It worked, except that I found I had to remove the postmaster.pid
file before running pg_ctl start
.
This was necessary because I had copied the files from a running database server in which the postmaster.pid
is present, and PostgreSQL won’t start if it is present.
Upgrading the database locally
To upgrade the database locally, I used pg_upgrade
, like this:
# Create a new DB that we will upgrade into
mise use postgres@16.6
initdb -D db-new -U postgres
# Upgrade
pg_upgrade -d db-old -D db-new \
-b /home/lervag/.local/share/mise/installs/postgres/15.5/bin \
-B /home/lervag/.local/share/mise/installs/postgres/16.6/bin
Then I copied the upgrade files back onto the Kubernetes pod:
kubectl cp db-new my-service-db/my-db-pg15-0:/var/lib/postgresql/data/pgdata-new -c postgres
Upgrading the deployment scripts
Once the new files were in place, I went in and updated our deployment.yml
.
I upgraded to 16.6 and instructed it to start from pgdata-new
.
This seemingly worked fine. However, I observed two things that were not working as expected:
When I look in the logs, I saw messages like
WARNING: database "postgres" has no actual collation version, but a version was recorded
This happened because I performed the upgrade on a computer with
glibc
. The PostgreSQL server runs in Kubernetes on an Alpine Linux pod withmusl libc
.When I tried to test by deploying to an environment in our project, I encountered a connection error. This turned out to be due to settings in both
pg_hba.conf
andpostgresql.conf
in the old version that were not carried over during the upgrade. The problem was resolved by fixing the configuration.
Still, it was not satisfying to have the warnings in our logs. And we decided to try again with a clean slate.
Second attempt
Now with pg_dump
The next idea was based on pg_dump
, which is another utility for backing up a PostgreSQL database.
According to the docs, it makes consistent backups even if the database is being used concurrently and it does not block other users.
The main difference is that pg_dump
only dumps a single database and it does not include global objects such as roles.
Further, pg_dump
extracts the database into an SQL script file, possible compressed and archived, while pg_basebackup
creates a physical byte-for-byte backup of the entire database cluster’s files.
So, we created a new plan:
Use
pg_dump
to obtain a logical backup.kubectl port-forward service/my-db 5432 pg_dump -C -U postgres -h localhost -p 5432 -Zgzip:9 --if-exists -c mydbname > db15.sql.gz
Create a new fresh
StatefulSet
with PostgreSQL 16.Add missing users with correct passwords, as well as grants:
CREATE USER user1 WITH PASSWORD 'user1'; CREATE USER user2 WITH PASSWORD 'user2'; GRANT CONNECT ON mydbname TO user1; GRANT CONNECT ON mydbname TO user2;
This is necessary, because as mentioned above,
pg_dump
does not include global objects such as roles.Restore from the dump.
kubectl port-forward service/my-db 5432 mise shell postgres@16.6 zcat db15.sql.gz | psql -U postgres -h localhost -p 5432 postgres -v ON_ERROR_STOP=1
I followed the plan, except I used kubectl cp
to copy the dump into Kubernetes and then ran the psql
command locally in the shell there.
It worked fine!
Conclusions
All in all, I found this work to be interesting and I learned a lot from doing it. The main things I want to remember:
- I should not create the new, updated database on a different system from where it will be running. Understanding the environment, such as library differences between systems, is crucial!
pg_dump
proved more reliable in our scenario compared topg_basebackup
, especially when dealing with different system architectures.- Working alongside a knowledgeable colleague accelerated the learning process and helped navigate unexpected challenges.
Overall, these insights not only facilitated a successful upgrade but also equipped me with strategies for future infrastructure challenges.
As a final remark, I should mention I just noticed docker-pgautoupgrade. It looks awesome, and I’ll definitely look into it the next time I need to do a PostgreSQL upgrade!
See also The tools that I love: mise-en-place for more reasons why I like
mise
! ↩︎