背景

我们的系统建设初期,使用腾讯云 CLB 负载均衡实现蓝绿发布,由于 CLB 的设计只能绑定腾讯云 CVM 服务器,对于 Serverless 集群架构很不友好。为解决这个问题,笔者尝试在腾讯云部署 NGINX Ingress Controller 实现金丝雀发布。

目标

探索 NGINX Ingress Controller 实现金丝雀发布。

部署

准备 Nginx 测试用例

部署 nginx-v1 和 nginx-v2 两个工作负载作为测试用例,使用 openresty/openresty:centos 作为基础镜像。

nginx-v1 部署代码片段如下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v1
name: nginx-v1
namespace: default
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: nginx
qcloud-app: nginx
version: v1
serviceName: ""
template:
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v1
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
image: openresty/openresty:centos
imagePullPolicy: IfNotPresent
name: nginx
resources:
limits:
cpu: 250m
memory: 512Mi
requests:
cpu: 250m
memory: 512Mi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/openresty/nginx/conf/nginx.conf
name: conf
subPath: nginx.conf
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: nginx-v1
name: conf
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate

# Service
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
name: nginx-v1
namespace: default
spec:
clusterIP: 10.0.0.1
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
k8s-app: nginx
qcloud-app: nginx
version: v1
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v1
name: nginx-v1
namespace: default
data:
nginx.conf: |
"worker_processes auto;
error_log /usr/local/openresty/nginx/logs/error.log warn;
pid /var/run/nginx.pid;

events {
accept_mutex on;
multi_accept on;
use epoll;
worker_connections 1024;
}

http {
sendfile on;
gzip on;
keepalive_timeout 30;

ignore_invalid_headers off;
server {
listen 80;
location / {
access_by_lua '
local header_str = ngx.say("nginx-v1")
';
}
}

include /etc/nginx/conf.d/*.conf;
}"

nginx-v2 部署代码片段如下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v2
name: nginx-v2
namespace: default
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: nginx
qcloud-app: nginx
version: v2
serviceName: ""
template:
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v2
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
image: openresty/openresty:centos
imagePullPolicy: IfNotPresent
name: nginx
resources:
limits:
cpu: 250m
memory: 512Mi
requests:
cpu: 250m
memory: 512Mi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/openresty/nginx/conf/nginx.conf
name: conf
subPath: nginx.conf
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: nginx-v2
name: conf
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate

# Service
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
name: nginx-v2
namespace: default
spec:
clusterIP: 10.0.0.2
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
k8s-app: nginx
qcloud-app: nginx
version: v2
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
labels:
k8s-app: nginx
qcloud-app: nginx
version: v2
name: nginx-v2
namespace: default
data:
nginx.conf: |
"worker_processes auto;
error_log /usr/local/openresty/nginx/logs/error.log warn;
pid /var/run/nginx.pid;

events {
accept_mutex on;
multi_accept on;
use epoll;
worker_connections 1024;
}

http {
sendfile on;
gzip on;
keepalive_timeout 30;

ignore_invalid_headers off;
server {
listen 80;
location / {
access_by_lua '
local header_str = ngx.say("nginx-v2")
';
}
}

include /etc/nginx/conf.d/*.conf;
}"

创建 NGINX Ingress Controller

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: cloud.tencent.com/v1alpha1
kind: NginxIngress
metadata:
name: nginx-ingress
spec:
ingressClass: nginx-ingress
service:
type: LoadBalancer
watchNamespace: default
workLoad:
hpa:
enable: true
maxReplicas: 2
metrics:
- pods:
metricName: k8s_pod_rate_cpu_core_used_limit
targetAverageValue: "80"
type: Pods
minReplicas: 1
template:
affinity: {}
container:
image: shjrccr.ccs.tencentyun.com/paas/nginx-ingress-controller:v0.49.3
resources:
limits:
cpu: "0.25"
memory: 512Mi
requests:
cpu: "0.25"
memory: 512Mi
type: deployment

对应的 ConfigMap 如下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
data:
access-log-path: /var/log/nginx/nginx_access.log
allow-snippet-annotations: "false"
error-log-path: /var/log/nginx/nginx_error.log
keep-alive-requests: "10000"
log-format-upstream: $remote_addr - $remote_user [$time_iso8601] $msec "$request"
$status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time
[$proxy_upstream_name] [$proxy_alternative_upstream_name] [$upstream_addr] [$upstream_response_length]
[$upstream_response_time] [$upstream_status] $req_id
max-worker-connections: "65536"
upstream-keepalive-connections: "200"
kind: ConfigMap
metadata:
labels:
k8s-app: nginx-ingress-ingress-nginx-controller
qcloud-app: nginx-ingress-ingress-nginx-controller
manager: tke-nginx-ingress-controller
name: nginx-ingress-ingress-nginx-controller
namespace: default

在腾讯云控制台可以看到 NGINX Ingress 实例已成功部署。

创建 NGINX Ingress

创建 nginx-ingress 指向 nginx-v1 服务。客户端请求默认路由到这个 Ingress。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx-ingress
kubernetes.io/ingress.rule-mix: "false"
kubernetes.io/ingress.subnetId: subnet-1234567
name: nginx-ingress
namespace: default
spec:
rules:
- http:
paths:
- backend:
serviceName: nginx-v1 # 绑定 Service
servicePort: 80
pathType: ImplementationSpecific
status:
loadBalancer:
ingress:
- ip: 172.28.84.54

创建 nginx-ingress-canary 指向 nginx-v2 服务,通过 nginx.ingress.kubernetes.io/canary: "true" 注解实现灰度。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx-ingress
kubernetes.io/ingress.rule-mix: "false"
kubernetes.io/ingress.subnetId: subnet-1234567
# 开启灰度
nginx.ingress.kubernetes.io/canary: "true"
name: nginx-ingress-canary
namespace: default
spec:
rules:
- http:
paths:
- backend:
serviceName: nginx-v2 # 绑定 Service
servicePort: 80
pathType: ImplementationSpecific
status:
loadBalancer:
ingress:
- ip: 172.28.84.54

腾讯云控制台可以看到,nginx-ingressnginx-ingress-canary 指向同一个访问 IP 172.28.84.54。

验证 A/B 测试

使用 curl 验证流量切分是否有效。

在未开启流量切分之前,curl 请求情况如下。

1
2
3
4
5
6
7
8
9
10
11
[root@localhost ~]# for i in {1..10}; do curl http://172.28.84.54; done;
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1

nginx-ingress-canary 添加如下代码。

1
2
3
4
5
6
7
8
9
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
# 基于 Header 的流量切分
nginx.ingress.kubernetes.io/canary-by-header: region
nginx.ingress.kubernetes.io/canary-by-header-pattern: gz|sz # 如果 Http 请求头包含 region=gz 或者 region=sz 都会转发到该 Ingress

在 curl 请求头添加 region=gz,请求情况如下。

1
2
3
4
5
6
7
8
[root@localhost ~]# curl http://172.28.84.54;
nginx-v1
[root@localhost ~]# curl -H "region: gz" http://172.28.84.54;
nginx-v2
[root@localhost ~]# curl -H "region: bj" http://172.28.84.54;
nginx-v1
[root@localhost ~]# curl -H "region: sz" http://172.28.84.54;
nginx-v2

验证通过,使用 nginx.ingress.kubernetes.io/canary-by-header-pattern: gz|sz 表示我们要灰度新版本到 gz 区域或者 sz 区域,客户端携带这些请求,流量就会进入灰度的 Ingress。

验证金丝雀发布

金丝雀发布主要通过权重逐步分流,基于 nginx.ingress.kubernetes.io/canary-weight 控制。

1
2
3
4
5
6
7
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"

发起 curl 请求,响应情况如下。

1
2
3
4
5
6
7
8
9
10
11
[root@localhost ~]# for i in {1..10}; do curl http://172.28.84.54; done;
nginx-v1
nginx-v2
nginx-v1
nginx-v2
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1
nginx-v1

验证通过,从结果可以看出,流量约 10% 的比例被灰度版本分流。

总结

基于 NGINX Ingress Controller 实现 A/B 测试、金丝雀发布还是比较简单的,但是 NGINX reload 问题并没有彻底解决。笔者更倾向于使用 APISIX Ingress Controller 实现金丝雀发布。