关键字 : CNI calico vxlan flannel ipv6-only ipv6+ipv4
在搭建ipv6-only或ipv6+ipv4的k8s集群时,在worker节点加入集群后,发现worker节点上的CNI启动失败。
以下是calico的
启动失败情况 :
kubectl get pod -A
输出如下 :
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-79949b87d-ptq2r 1/1 Running 0 19m
kube-system calico-node-jbrn7 0/1 Init:CrashLoopBackOff 7 (40s ago) 14m
kube-system calico-node-xnwdx 1/1 Running 0 19m
kube-system coredns-6766b7b6bb-wc5j5 1/1 Running 0 20m
kube-system coredns-6766b7b6bb-wvg5w 1/1 Running 0 20m
kube-system etcd-myserver1 1/1 Running 0 20m
kube-system kube-apiserver-myserver1 1/1 Running 0 20m
kube-system kube-controller-manager-myserver1 1/1 Running 0 20m
kube-system kube-proxy-g8gxb 1/1 Running 0 20m
kube-system kube-proxy-lnddv 1/1 Running 0 14m
kube-system kube-scheduler-myserver1 1/1 Running 0 20m
查看POD calico-node-jbrn7
的详细情况,输出类似如下 :
kubectl describe pod -n kube-system calico-node-jbrn7
输出如下 :
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 20m default-scheduler Successfully assigned kube-system/calico-node-jbrn7 to worker1Normal Pulled 20m kubelet Container image "docker.io/calico/cni:v3.29.3" already present on machineNormal Created 20m kubelet Created container: upgrade-ipamNormal Started 20m kubelet Started container upgrade-ipamNormal Created 6m40s (x8 over 20m) kubelet Created container: install-cniNormal Started 6m40s (x8 over 20m) kubelet Started container install-cniNormal Pulled 68s (x9 over 20m) kubelet Container image "docker.io/calico/cni:v3.29.3" already present on machineWarning BackOff 11s (x76 over 20m) kubelet Back-off restarting failed container install-cni in pod calico-node-jbrn7_kube-system(a888f1ad-ec45-4207-94ac-f2953bda9d0e)
在事件Events
中可以看到是执行POD中的名为install-cni
的容器时发生了异常.
再查看容器install-cni
的日志可以看到如下内容 :
2025-05-29 09:53:04.523 [INFO][1] cni-installer/install.go 233: CNI plugin version: v3.29.3
2025-05-29 09:53:04.523 [INFO][1] cni-installer/install.go 185: /host/secondary-bin-dir is not writeable, skipping
2025-05-29 09:53:04.523 [WARNING][1] cni-installer/winutils.go 150: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2025-05-29 09:53:34.524 [ERROR][1] cni-installer/token_watch.go 108: Unable to create token for CNI kubeconfig error=Post "https://[fd15:4ba5:5a2b:1008:2000::1]:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp [fd15:4ba5:5a2b:1008:2000::1]:443: i/o timeout
2025-05-29 09:53:34.524 [FATAL][1] cni-installer/install.go 478: Unable to create token for CNI kubeconfig error=Post "https://[fd15:4ba5:5a2b:1008:2000::1]:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp [fd15:4ba5:5a2b:1008:2000::1]:443: i/o timeout
即:无法连接到API Server的clusterIP [fd15:4ba5:5a2b:1008:2000::1]:443
这种情况在IPV4时不会出现
原理不多说,直接给出解决办法:
- 修改calico的YAML文件,让calico-node连接API SERVER的物理IPV6地址,即执行
ip a
所看到的IPV6地址
在calico.yaml
文件中新增名为kubernetes-services-endpoint
的ConfigMap
对象,如下所示:
kind: ConfigMap
apiVersion: v1
metadata:name: kubernetes-services-endpointnamespace: kube-system
data:# 指定 API Server 的节点 IPKUBERNETES_SERVICE_HOST: "fd15:4ba5:5a2b:1008:192:168:186:40"KUBERNETES_SERVICE_PORT: "6443"
注意 :
- 1.必须新创建
ConfigMap
对象,不能在原有的名为calico-config
的ConfigMap
对象上修改; - 2.新创建
ConfigMap
对象的名字必须是kubernetes-services-endpoint
- 3.在IPV6单栈和IPV6为主的双栈情况下还需要在
calico.yaml
中的DaemonSet
设置其它相关环境变量,这里就不赘述了
下面是flannel
的修改方法
修改kube-flannel.yml
中的DaemonSet
下的env
部分,新增环境变量KUBERNETES_SERVICE_HOST
和KUBERNETES_SERVICE_PORT
.如下所示 :
---
apiVersion: apps/v1
kind: DaemonSet
metadata:labels:app: flannelk8s-app: flanneltier: nodename: kube-flannel-dsnamespace: kube-flannel
spec:selector:matchLabels:app: flannelk8s-app: flanneltemplate:metadata:labels:app: flannelk8s-app: flanneltier: nodespec:affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:- matchExpressions:- key: kubernetes.io/osoperator: Invalues:- linuxcontainers:- args:- --ip-masq- --kube-subnet-mgrcommand:- /opt/bin/flanneldenv:- name: POD_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: POD_NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: EVENT_QUEUE_DEPTHvalue: "5000"- name: FLANNELD_IFACEvalue: "ens33"# 指定API Server的节点IP地址- name: KUBERNETES_SERVICE_HOSTvalue: "fd15:4ba5:5a2b:1008:192:168:186:40"- name: KUBERNETES_SERVICE_PORTvalue: "6443"image: ghcr.io/flannel-io/flannel:v0.26.7name: kube-flannel