Multi-Cloud Strategies with Crunchy Postgres for Kubernetes

Joseph.Mckulka@crunchydata.com (Joseph Mckulka) — Thu, 17 Nov 2022 10:00:00 EST

Crunchy Postgres for Kubernetes can be used for cross-datacenter streaming replication out of the box. With so many folks asking for cross-cloud / cross-datacenter replication, we wanted to give people a large explanation of how that works. For this post, we use streaming replication, and prioritize reducing latency and adding stability.

Cross-cloud streaming replication can be used:

To enable multi-cloud disaster recovery
For moving clusters between cloud providers
For moving clusters between on-premises and cloud

Given the power of this feature, we decided to incorporate streaming replication directly into PGO. With the 5.2 release this is easily configurable through the postgrescluster spec without the need for manual Postgres configuration to set up the streaming replication.

Setup Cloud Environments

In this sample scenario, we will create postgresclusters in both EKS and GKE clouds. EKS will be used as our primary environment, and GKE will be a standby. PGO will need to be deployed in EKS and GKE to create postgresclusters in both environments.

The standby database needs to connect directly to the primary database over the network. This means the primary environment (EKS) needs to be able to create services with an external IP. In this example, we are using the LoadBalancer service type, which is easily configurable through the postgrescluster spec.

Both postgresclusters will need copies of the same TLS certificates to allow replication. Please look at the custom TLS section of our docs for guidance on creating custom cert secrets in the format that PGO expects. This will need to be done in both environments. In this example, we have copies of the cluster-cert and replication-cert secrets in both Kubernetes environments.

Create Clusters

Now that our cloud environments are configured, we can create the primary and standby clusters. First, we will create the primary cluster and allow it to startup. Then we will have to take note of the external IP that is created for the primary service on the cluster. After we have the IP, we can create our standby cluster.

Primary

For the primary, we create a postgrescluster with the following spec. We have defined the custom TLS certs that we created in both environments. We also specified that the service that exposes the PostgreSQL primary instance should have the type LoadBalancer.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: primary
  namespace: postgres-operator
spec:
  service:
    type: LoadBalancer
  postgresVersion: 14
  customTLSSecret:
    name: cluster-cert
  customReplicationTLSSecret:
    name: replication-cert
  instances:
    - name: instance1
      replicas: 1
      dataVolumeClaimSpec:
        {
          accessModes: [ReadWriteOnce],
          resources: { requests: { storage: 1Gi } },
        }
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              {
                accessModes: [ReadWriteOnce],
                resources: { requests: { storage: 1Gi } },
              }

After you create a postgrescluster with this spec, wait for an initial backup and the cluster to be ready. After that, your primary should be ready, and you can start setting up the standby. Before you switch to the GKE cluster, you will need the external IP from the primary-ha service.

$ k get svc
NAME                TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)          AGE
primary-ha          LoadBalancer   10.100.4.48      a078e7d173f214d9ca0e7d122052aa5a-1097707392.us-east-1.elb.amazonaws.com   5432:30985/TCP

Standby

Now that we have the primary cluster, we can create our standby. Here we are using the spec.standby fields in the PostgresCluster spec. When filling out the standby spec, we have a few options. You can provide a host, a repoName, or both. In this scenario, we are using streaming replication and will need to provide a host. The host in the spec below is the external IP we copied from the primary-ha service.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: standby
  namespace: postgres-operator
spec:
  standby:
    enabled: true
    host: a2eb494c1f05a414dafb62743c790ba1-2010516878.us-east-1.elb.amazonaws.com
  postgresVersion: 14
  customTLSSecret:
    name: cluster-cert
  customReplicationTLSSecret:
    name: replication-cert
  instances:
    - name: instance1
      replicas: 1
      dataVolumeClaimSpec:
        {
          accessModes: [ReadWriteOnce],
          resources: { requests: { storage: 1Gi } },
        }
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              {
                accessModes: [ReadWriteOnce],
                resources: { requests: { storage: 1Gi } },
              }

The standby cluster will look slightly different from the primary. You can expect the standby to have instance pods (one for every replica defined in the spec) and a repo-host pod. One thing you will not see is an initial backup on the cluster.

Verify Streaming Replication

Now that we have a standby using streaming replication, it is a good time to check that replication is configured correctly and working as expected. The first thing you should check is that any data you create is replicated over to the standby. The time this takes will depend on network latency and the size of the data. If you see data from your primary database, streaming replication is active, you are good to go.

If you have exec privileges in your Kubernetes cluster, there are a few commands you can use to verify data replication and streaming. In the following two commands, we exec into the standby database, check that the walreciever process is running, and check that we have a streaming status in pg_stat_wal_receiver.

$ kubectl exec -it standby-instance1-bkbl-0 -c database -- bash
bash-4.4$ ps -U postgres -x | grep walreceiver
     95 ?        Ss     0:10 postgres: standby-ha: walreceiver streaming 0/A000000bash-4.4$ psql -c "select pid,status,sender_host from pg_stat_wal_receiver;"
bash-4.4$ psql -c "select pid,status,sender_host from pg_stat_wal_receiver;"
 pid |  status   |                               sender_host
-----+-----------+-------------------------------------------------------------------------
  95 | streaming | a078e7d173f214d9ca0e7d122052aa5a-1097707392.us-east-1.elb.amazonaws.com
(1 row

Promote the standby

Now that you can see your data being replicated from the primary to the standby, you are ready to promote the standby in a disaster scenario. This is done by updating the spec of the standby cluster so that standby.enabled is false or removing the standby section entirely.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: standby
spec:
  standby:
    enabled: true

After you promote the standby, it will work as a fully functioning postgrescluster that you can backup, scale and use as you would expect. You can also use the new primary to create another standby cluster!

Conclusion

If you've been looking for a solution for streaming replication, you may have come across Brian Pace's article earlier this year on Streaming Replication using pgBackRest. I'm excited that with PGO 5.2., this is even easier to setup. Streaming replication adds another tool into our operator to allow customers to find the disaster recovery solutions to meet their needs.

Joseph Mckulka | CrunchyData Blog