背景:
一台 master 磁盘爆了导致 k8s 服务故障,重启之后死活 kubelet 起不来,于是笔者就想把它给 reset 掉重新 join,接着出现如下报错提示是说 etcd 集群健康检查未通过:
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 172.31.182.152:2379: connect: connection refused
解决方法:
1.在 kubeadm-config 删除的状态不存在的 etcd 节点:
kubectl edit configmaps -n kube-system kubeadm-config
cn-hongkong.i-j6caps6av1mtyxyofmrw:
advertiseAddress: 172.31.182.152
bindPort: 6443
把上边的删掉:
2.因为笔者是用 kubeadm 搭建的集群,所有 etcd 在每个 master 节点都会以 pod 的形式存在一个,etcd 是在每个控制平面都启动一个实例的,当删除 k8s-001 节点时,etcd 集群未自动删除此节点上的 etcd 成员,因此需要手动删除。
注意这里首先要进入 etcd 的 pod。
kubectl exec -it etcd-cn-hongkong.i-j6caps6av1mtyxyofmrx sh -n kube-system
export ETCDCTL_API=3
alias etcdctl='etcdctl --endpoints=https://172.31.182.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d4322ce19cc3f8da, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
#删除不存在的节点
/ # etcdctl member remove d4322ce19cc3f8da
Member d4322ce19cc3f8da removed from cluster ed812b9f85d5bcd7
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # etcdctl member list
cd4e1e075b1904b2, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # exit
最后每次 kubeadm join 失败后要 kubeadm reset 重置节点,在kubeadm join 才会成功。
与君共勉!
出处:http://1t.click/aCXa
Python 运维自动化进阶课程10.13开课了,试听的找小助手
Docker + K8s 课程是11.16月份开课
Python 基础实战课程
详情微信扫码咨询小助手