k8s搭建高可用pg集群

Ethereal Lv4

1. 下载csi

1
2
wget https://gh-proxy.com/github.com/CrunchyData/postgres-operator-examples/archive/refs/heads/main.zip
tar xvf main.zip

2. 修改部署

2.1 修改helm/postgres/values.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 修改持久存储
instanceSize: 10Gi
instanceStorageClassName: "default-cephfs"
instanceReplicas: 3
backupsSize: 10Gi
backupsStorageClassName: "default-cephfs"
# 修改service
service:
type: NodePort
# 修改pg配置,启用远程登录
patroni:
dynamicConfiguration:
# synchronous_mode: true
postgresql:
pg_hba:
- host replication repl 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5

2.2 部署

1
2
helm install postgres /disk2/shared/build_offline_origin/pg/postgres-operator-examples-main/helm/install -n postgres
helm install postgres /disk2/shared/build_offline_origin/pg/postgres-operator-examples-main/helm/postgres -n postgres

3. 登录

3.1 获取账户和密码

1
2
3
4
5
6
# 获取登录uri
kubectl -n postgres get secrets postgres-pguser-postgres -o go-template='{{.data.uri | base64decode}}'
# 获取登录账户
kubectl -n postgres get secrets postgres-pguser-postgres -o go-template='{{.data.user | base64decode}}'
# 获取登录密码
kubectl -n postgres get secrets postgres-pguser-postgres -o go-template='{{.data.password | base64decode}}'

3.1 连接查看

注意,需要使用pg版本为17的client,ubuntu24安装的版本为16,不匹配

1
2
3
4
5
6
7
psql postgresql://postgresroot:E6%5Ea3%29zVD48mMNYaA%29bF%40wPv@localhost:32635
# 查看所有数据库
\l
# 新增数据库
create database argo_workflows;
# 删除数据库
drop database argo_workflows;

4. 拓展

4.1 Auto-Grow

Auto-Grow

1
2
3
# 修改helm/install/values.yaml
features:
AutoGrowVolumes: true

4.2 Logical Replication(PG性质,与PGO没什么关系)

Logical Replication

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 与backups平级,在需要发布的集群上设置
users:
- name: logic
databases:
- zoo
options: "REPLICATION"


# 创建发布
kubectl exec -it -n postgres-operator -c database $(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=rhino,postgres-operator.crunchydata.com/role=master' -o name) -- psql zoo
# 创建权限
CREATE TABLE abc (id int PRIMARY KEY);
INSERT INTO abc SELECT * FROM generate_series(1,10);
GRANT SELECT ON abc TO logic;
# 发布
CREATE PUBLICATION zoo FOR ALL TABLES;
# 获取登录密钥
kubectl -n postgres-operator get secrets rhino-pguser-logic -o go-template='{{.data.host | base64decode}}'
kubectl -n postgres-operator get secrets rhino-pguser-logic -o go-template='{{.data.user | base64decode }}'
kubectl -n postgres-operator get secrets rhino-pguser-logic -o go-template='{{.data.password | base64decode }}'

# 创建订阅
kubectl exec -it -n postgres-operator -c database $(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o name) -- psql
# 创建数据库
CREATE TABLE abc (id int PRIMARY KEY);
# 订阅
CREATE SUBSCRIPTION zoo
CONNECTION 'host=rhino-primary.postgres-operator.svc user=logic dbname=zoo password=$LOGIC_PASSWORD'
PUBLICATION zoo;

4.3 Major Version Upgrade

Postgres Major Version Upgrade

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 创建pgUpgrade对象,填入需要更新的集群名称
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PGUpgrade
metadata:
name: hippo-upgrade
spec:
postgresClusterName: hippo
fromPostgresVersion: 16
toPostgresVersion: 17

# 修改cluster注释,允许pgUpgrade对象更新,字段value为pgUpgrade对象的名称
kubectl -n postgres-operator annotate postgrescluster hippo postgres-operator.crunchydata.com/allow-upgrade="hippo-upgrade"

# 关闭集群,修改values.yaml并使用helm更新
shutdown: true

# 等待pgUpgrade进度完成,可以通过查看其状态来看


# 重启集群,修改values.yaml并使用helm更新
shutdown: false

4.4 备份、克隆、回滚、恢复与快照

备份将数据保存至pgbackrest指定的位置

回滚从pgbackrest中回滚

克隆、恢复从dataSource中克隆或恢复(仅限创建集群时)

快照将保存到快照的pvc中(指pv的快照)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# 备份
## 关闭自动备份,给集群打下面的注释
postgres-operator.crunchydata.com/authorizeBackupRemoval="true"
## 多种备份方式,可以选volume,s3或gcs等,甚至可以多处同时备份
### 自己定义multiBackupRepos部分
## 修改备份频率
### 在template中找到
repos:
- name: repo1
volume:
volumeClaimSpec:
{{- if .Values.backupsStorageClassName }}
storageClassName: {{ .Values.backupsStorageClassName | quote }}
{{- end }}
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: {{ default "1Gi" .Values.backupsSize | quote }}
{{- end }}

### 在与volume平级添加:
schedules:
full: "0 1 * * 0"
differential: "0 1 * * 1-6"
## 手动备份
### 在repos平级添加
manual:
repoName: repo1
options:
- --type=full

### 触发备份
kubectl annotate -n postgres-operator postgrescluster hippo postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"


# 克隆
## 在values中添加dataSource指定如下
dataSource:
postgresCluster:
clusterName: hippo
repoName: repo1
options:
- --type=time
- --target="2021-06-09 14:15:11-04"

# 回滚,在values中添加pgBackRestConfig
restore:
enabled: true
repoName: repo1
options:
- --type=time
- --target="2021-06-09 14:15:11-04" # 指定时间


- --db-include=hippo # 可以指定特定的数据库

## 触发回滚
kubectl annotate -n postgres-operator postgrescluster hippo --overwrite postgres-operator.crunchydata.com/pgbackrest-restore="$(date)"

## 关闭回滚
restore:
enabled: false

# 恢复
## 修改helm/postgres/values.yaml,其中三个部分分别对应pgdata相对于PVC挂载根目录,pgwal如果与pgdata属于同一个pvc那么可以去掉,pgBackRestVolume对应上面的备份结果
## 当集群建立后,应该移除valume字段并重新部署
dataSource:
volumes:
pgDataVolume:
pvcName: oldhippo
directory: oldhippo
pgWALVolume:
pvcName: oldhippo-wal
pgBackRestVolume:
pvcName: oldhippo-pgbr-repo
directory: oldhippo-backrest-shared-repo



# 快照
## 检查或安装CRD
kubectl get crd volumesnapshotclasses.snapshot.storage.k8s.io
kubectl get crd volumesnapshotcontents.snapshot.storage.k8s.io
kubectl get crd volumesnapshots.snapshot.storage.k8s.io
## https://github.com/kubernetes-csi/external-snapshotter

## 启用PGO特性,修改helm/install/values.yaml
PGO_FEATURE_GATES="VolumeSnapshots=true"

## 在template中定制
spec:
backups:
snapshots:
volumeSnapshotClassName: <name of the snapshot class>

4.5 standby(只读集群)

从pgbackrest中读取

在values.yaml中修改即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# 从repo中建立
backups:
pgbackrest:
repos:
- name: repo1
s3:
bucket: "my-bucket"
endpoint: "s3.ca-central-1.amazonaws.com"
region: "ca-central-1"
standby:
enabled: true
repoName: repo1


# 从流中建立
pgbackrest:
repos:
- name: repo1
volume:
volumeClaimSpec: { accessModes: [ReadWriteOnce], resources: { requests: { storage: 1Gi } } }
customTLSSecret:
name: cluster-cert
customReplicationTLSSecret:
name: replication-cert
standby:
enabled: true
host: "192.0.2.2"
port: 5432


# 从流与repo中建立,注意repo必须是来自流的备份
backups:
pgbackrest:
repos:
- name: repo1
s3:
bucket: "my-bucket"
endpoint: "s3.ca-central-1.amazonaws.com"
region: "ca-central-1"
customTLSSecret:
name: cluster-cert
customReplicationTLSSecret:
name: replication-cert
standby:
enabled: true
repoName: repo1
host: "192.0.2.2"
port: 5432


# 转换standby为可读集群
standby:
enabled: false

4.6 Synchronous Replication

Synchronous Replication

1
2
3
patroni:
dynamicConfiguration:
synchronous_mode: true

功能:当写入另外一个节点时才会返回成功

4.7 PostGIS

PostGIS教程一:PostGIS介绍 - 知乎

使得pg可以记录地理信息

4.8 pgBouncer

1
2
# 只需要启用即可
pgBouncerReplicas: 1

4.9 Monitoring

1
monitoring: true

创建service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2025-03-19T07:37:18Z"
name: postgres-monitoring
namespace: postgres
spec:
ports:
- name: monitoring
port: 9187
protocol: TCP
targetPort: 9187
sessionAffinity: None
type: NodePort
selector:
postgres-operator.crunchydata.com/role: master

现在即可在prometheus中设定job

1
2
3
4
- job_name: postgres
static_configs:
- targets:
- postgres.service.com:32480

4.10 分离式WAL设计

1
2
3
4
5
6
7
8
9
10
# 需要修改template
spec:
instances:
- name: instance
walVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi

4.11 大页

启用PG大页

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
postgresVersion: 17
instances:
- name: instance1
resources:
limits:
hugepages-2Mi: 16Mi
memory: 4Gi


spec:
patroni:
dynamicConfiguration:
postgresql:
parameters:
huge_page_size: 1GB

当在有大页的节点上不启用大页时将崩溃,解决方法除了启用PG大页之外还可以:

1
2
3
4
5
6
spec:
patroni:
dynamicConfiguration:
postgresql:
parameters:
shared_buffers: 128MB

4.12 Patroni

【DB宝89】PG高可用集群之Patroni + etcd + HAProxy + keepalived+普罗米修斯监控部署-腾讯云开发者社区-腾讯云

93427b861fcdb5b25000390de9744b9c

patroni是单主多备份架构(只有一个节点允许写入,其余通过流方式做备份,可读)

通过etcd中使用raft协议进行选主

在pgo架构中,首先有一个服务postgres-pods连接了所有节点,服务postgres-replicas连接了所有的备份(从节点),服务postgres-ha连向主节点,postgres-primary连向postgres-ha,而postgres-pgbouncer连向postgres-primary

Introduction — Patroni 4.0.5 documentation

1
2
synchronous_mode: on
synchronous_node_count: 2 # 设置同步时有多少个节点收到即可返回成功

4.13 PgBouncer

参数PgBouncer config

1
2
3
4
5
6
7
pool_mode:决定何时返回到连接池中,session结束后、transaction结束后、statement结束后
max_client_conn:最大连接数
default_pool_size:池大小
min_pool_size:当池剩余小于此数时补充
max_db_connections:允许连接到同一个数据库的服务端最大数目
max_db_client_connections:允许连接到同一个数据库的客户端最大值
server_round_robin:启用RR连接策略

5. 参考

Create a Postgres Cluster

在k8s上部署Crunchy Postgres for Kubernetes-CSDN博客

云原生 PostgreSQL 集群 - PGO:来自 Crunchy Data 的 Postgres Operator-阿里云开发者社区

Tutorials

Create a Postgres Cluster

Linux系统中PostgreSQL客户端的安装_linux安装pg客户端-CSDN博客

配置PostgreSQL允许远程连接的方法_pgsql远程访问权限-CSDN博客

PostgreSQL 配置远程访问 - 晓枫的春天 - 博客园

Patroni

浅析Patroni2.0 配置文件 - 墨天轮

patroni/postgres0.yml at master · patroni/patroni

使用 psql 列出 PostgreSQL 数据库和表_posgres创建数据库后,用什么命令显示数据库信息-CSDN博客

K8S - kubectl patch 实现非交互式修改 yaml 文件完成 pod 变动 - 公博义 - 博客园

k8s 服务 改为 nodeport k8s service配置_mob6454cc696f04的技术博客_51CTO博客

使用 psql 列出 PostgreSQL 数据库和表_posgres创建数据库后,用什么命令显示数据库信息-CSDN博客

Connect to a Postgres Cluster

Auto-Growable Disk

Logical Replication

Postgres Major Version Upgrade

Quickstart

kubernetes-csi/external-snapshotter: Sidecar container that watches Kubernetes Snapshot CRD objects and triggers CreateSnapshot/DeleteSnapshot against a CSI endpoint.

什么是 WAL - 知乎

PgBouncer config

RDS PostgreSQL内置连接池PgBouncer-阿里云开发者社区

Patroni官方给出的流程图-CSDN博客

Introduction — Patroni 4.0.5 documentation

【DB宝89】PG高可用集群之Patroni + etcd + HAProxy + keepalived+普罗米修斯监控部署-腾讯云开发者社区-腾讯云

使用 Patroni 实现 PostgreSQL 的高可用 - Redrock Postgres

Monitoring

https://github.com/CrunchyData/pgmonitor/blob/development/prometheus/linux/crunchy-prometheus.yml

prometheus + postgres_exporter监控postgresql_postgres-exporter-CSDN博客

  • Title: k8s搭建高可用pg集群
  • Author: Ethereal
  • Created at: 2025-03-10 17:59:18
  • Updated at: 2025-03-23 13:06:17
  • Link: https://ethereal-o.github.io/2025/03/10/k8s搭建高可用pg集群/
  • License: This work is licensed under CC BY-NC-SA 4.0.
 Comments