Kubernetes master 无法加入 etcd 集群解决方法

点击蓝色字关注我们!


背景:
一台 master 磁盘爆了导致 k8s 服务故障,重启之后死活 kubelet 起不来,于是笔者就想把它给 reset 掉重新 join,接着出现如下报错提示是说 etcd 集群健康检查未通过:


error execution phase check-etcd: error syncing endpoints with etc: dial tcp 172.31.182.152:2379: connect: connection refused


 解决方法:

1.在 kubeadm-config 删除的状态不存在的 etcd 节点:

kubectl edit configmaps -n kube-system kubeadm-config


cn-hongkong.i-j6caps6av1mtyxyofmrw:
advertiseAddress: 172.31.182.152
bindPort: 6443


把上边的删掉:


2.因为笔者是用 kubeadm 搭建的集群,所有 etcd 在每个 master 节点都会以 pod 的形式存在一个,etcd 是在每个控制平面都启动一个实例的,当删除 k8s-001 节点时,etcd 集群未自动删除此节点上的 etcd 成员,因此需要手动删除。


注意这里首先要进入 etcd 的 pod。

kubectl exec -it etcd-cn-hongkong.i-j6caps6av1mtyxyofmrx sh -n kube-system

export ETCDCTL_API=3alias etcdctl='etcdctl --endpoints=https://172.31.182.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'/ # etcdctl member listceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379d4322ce19cc3f8da, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379 #删除不存在的节点/ # etcdctl member remove d4322ce19cc3f8daMember d4322ce19cc3f8da removed from cluster ed812b9f85d5bcd7/ # etcdctl member listceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379/ # etcdctl member listcd4e1e075b1904b2, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379/ # exit



最后每次 kubeadm join 失败后要 kubeadm reset 重置节点,在kubeadm join 才会成功。


与君共勉!



出处:http://1t.click/aCXa


Python 运维自动化进阶课程10.13开课了,试听的找小助手

Docker + K8s 课程是11.16月份开课


Python 基础实战课程


详情微信扫码咨询小助手


在看点这里