hello云胜

技术与生活

0%

kubeAdm部署K8S高可用集群

kubeAdm部署K8S高可用集群

前置准备

服务器环境

修改hostname
1
vim /etc/hostname
打通ssh免密登录

选一台master机器到其他机器的免密ssh登录

关闭防火墙
1
2
systemctl stop firewalld
systemctl disable firewalld
禁用selinux

getenforce 如果是disable即为已经关闭,否则执行

1
2
setenforce 0
sed -i 's/^SELINUX=.*$/SELINUX=disable/' /etc/selinux/config
关闭swap分区
1
2
3
4
5
swapoff -a
echo "vm.swappiness=0" >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf
sed -i 's$/dev/mapper/centos-swap$#/dev/mapper/centos-swap$g' /etc/fstab
free -m
ulimit
1
2
3
4
5
6
7
8
9
10
11
ulimit -n
1024

临时生效设置如下:
# ulimit -SHn 65535

永久生效设置,添加如下两行
# vim /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

时间同步
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
先查看一下系统是否安装ntp,如下命令:
[root@k8s-master21 ~]# rpm -qa ntp

如果没有安装ntp,则如下安装:
[root@k8s-master21 ~]# yum install ntp -y

先设置好时区,如下:
[root@k8s-master21 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@k8s-master21 ~]# echo "Asia/Shanghai" > /etc/timezone

再设置阿里云的时间同步服务器,如下:
[root@k8s-master21 ~]# ntpdate time2.aliyun.com
26 Dec 15:35:57 ntpdate[7043]: step time server 203.107.6.88 offset -8.227227 sec

再将时间同步添加到系统定时任务中,如下命令后,添加 */5 * * * * ntpdate time2.aliyun.com 保存退出即可:
[root@k8s-master21 ~]# crontab -e
*/5 * * * * ntpdate time2.aliyun.com

最后,将时间同步添加到开机自启动中,打开/etc/rc.local 添加 ntpdate time2.aliyun.com 保存退出即可:
[root@k8s-master21 ~]# vim /etc/rc.local
ntpdate time2.aliyun.com
开启ipv4 转发

为了让 Kubernetes 能够检查、转发网络流量

1
2
3
4
5
cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
1
2
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
加载ip_vs模块
1
2
3
4
5
6
7
8
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
1
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash  /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

部署docker

移除旧版本docker,避免干扰
1
2
3
4
5
6
7
8
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
安装依赖
1
yum install -y yum-utils device-mapper-persistent-data lvm2
指定阿里云镜像源
1
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
部署docker
1
yum install docker-ce
1
2
3
4
5
6
7
8
mkdir -p /etc/docker/

cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://6kx4zyno.mirror.aliyuncs.com"]
}
EOF

“exec-opts”: [“native.cgroupdriver=systemd”] 为kubelete启动必须

然后重启docker

1
systemctl enable docker && systemctl daemon-reload &&  systemctl restart docker

部署K8S

部署kubeadm kubelet kubectl

配置镜像源
1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
1
2
3
4
5
yum install -y kubectl-1.22.8 kubelet-1.22.8 kubeadm-1.22.8

# 也可安装kubelet kubeadm kubectl 其他版本 但注意需要和docker版本匹配
# 查看 可安装 kubelet kubeadm kube版本
# yum list kubeadm kubelet kubectl --showduplicates|sort -r
配置kubelet的pause镜像

默认配置的pause镜像使用gcr.io仓库,国内可能无法访问,所以这里配置Kubelet使用阿里云的pause镜像:

1
2
3
cat >/etc/sysconfig/kubelet<<EOF
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.2"
EOF
设置开机启动
1
systemctl daemon-reload && systemctl enable --now kubelet

安装master

使用kubeadm init需要一个config配置文件

可以使用先init一个默认config文件

1
kubeadm config print init-defaults

所有master节点

1
vim /root/kubeadm-config.yaml

内容按实际情况修改,kubeadm 配置 (v1beta3) | Kubernetes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1x.xx.12.181
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
imagePullPolicy: IfNotPresent
name: k8s-master1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 1x.xx.12.181:6443
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.22.0
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 1x.xx.0.0/12
scheduler: {}

提前下载镜像,可以节省初始化时间:

1
kubeadm config images pull --config /root/kubeadm-config.yaml 

在master1节点进行初始化

1
kubeadm init --config /root/kubeadm-config.yaml  --upload-certs

初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master1即可。

成功之后会打印

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:30e24f2bf41b30864c9a2aff54aece349f3bdff2882b9b54145cd9c7976e8ef9 \
--control-plane --certificate-key e1fec22ea86d49cf3f74bab846075cf234fb19f4de18243bcdd220537fa4285f

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:30e24f2bf41b30864c9a2aff54aece349f3bdff2882b9b54145cd9c7976e8ef9

kubeadm 的配置管理是通过 pod 管理的,所有的组件都是通过容器启动的,通过 /etc/kubernetes/manifests 目录下面的 yaml 文件启动,这就是 kubelet 生命周期管理的目录,在这里面配置一个 pod 的 yaml 文件,它就会为你管理 pod 的生命周期。
进入到该目录中,可以看到以下文件

1
2
3
cd /etc/kubernetes/manifests
ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml

master1节点配置环境变量,用于kubectl访问Kubernetes集群。

通过 admin.conf 文件和 k8s 通讯

1
2
3
4
cat <<EOF >> /root/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF
source /root/.bashrc

查看节点状态:

1
2
3
[root@k8s-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady control-plane,master 13m v1.22.8

查看service

1
2
3
4
[root@k8s-master1 ~]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 1x.xx.0.1 <none> 443/TCP 19m
kube-system kube-dns ClusterIP 1x.xx.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m

采用初始化安装方式,所有的系统组件均以容器的方式运行并且在kube-system命名空间内

查看pod

1
2
3
4
5
6
7
8
9
10
[root@k8s-master1 ~]# kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-7d89d9b6b8-6xpvs 0/1 Pending 0 49m <none> <none> <none> <none>
kube-system coredns-7d89d9b6b8-d5q2h 0/1 Pending 0 49m <none> <none> <none> <none>
kube-system etcd-k8s-master1 1/1 Running 0 50m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-apiserver-k8s-master1 1/1 Running 1 (34m ago) 51m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-controller-manager-k8s-master1 1/1 Running 1 (108s ago) 3m10s 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-proxy-kb9w9 1/1 Running 0 49m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-scheduler-k8s-master1 1/1 Running 1 (109s ago) 3m10s 1x.xx.12.181 k8s-master1 <none> <none>

全部running

加入其他master

使用前面init成功后打印的join

1
2
3
kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:e8158944761027754f2ff819e8d299af1095bbc12699a2f34b6d8d80a67d8b6f \
--control-plane --certificate-key 707aa3ccb1d514ea0270683f7c0ed351f08976617cebe902f1771f6198b6b344

注意,token的有效期默认设置是24小时,过期后需要重新生成

1
2
3
4
5
[root@k8s-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady control-plane,master 59m v1.22.8
k8s-master2 NotReady control-plane,master 2m38s v1.22.8
k8s-master3 NotReady control-plane,master 2m37s v1.22.8

image-20220803174716699

加入node节点

1
2
kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:e8158944761027754f2ff819e8d299af1095bbc12699a2f34b6d8d80a67d8b6f

image-20220803175123082

安装Calico

在master1节点进行

问题解决:

安装docker需要container-selinux >= 2:2.74

1
2
错误:软件包:containerd.io-1.6.6-3.1.el7.x86_64 (docker-ce-stable)
需要:container-selinux >= 2:2.74

http://mirror.centos.org/centos/7/extras/x86_64/Packages/

image-20220803141326417

1
yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.119.2-1.911c772.el7_8.noarch.rpm

更好的解决办法:

#进入yum 源配置文件夹
cd /etc/yum.repos.d
mv CentOS-Base.repo CentOS-Base.repo_bak

在/etc/yum.repos.d/docker-ce.repo文件顶部添加一个条目,内容如下:
[centos-extras]
name=Centos extras - $basearch
baseurl=http://mirror.centos.org/centos/7/extras/x86_64
enabled=1
gpgcheck=0

保存退出

#然后安装命令:
yum -y install slirp4netns fuse-overlayfs container-selinux

controller-manager和scheduler状态异常

1
2
3
4
5
6
7
8
9
[root@k8s-master1 ~]# kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-7d89d9b6b8-6xpvs 0/1 Pending 0 28m <none> <none> <none> <none>
kube-system coredns-7d89d9b6b8-d5q2h 0/1 Pending 0 28m <none> <none> <none> <none>
kube-system etcd-k8s-master1 1/1 Running 0 30m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-apiserver-k8s-master1 1/1 Running 1 (14m ago) 30m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-controller-manager-k8s-master1 0/1 CrashLoopBackOff 7 (4m31s ago) 30m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-proxy-kb9w9 1/1 Running 0 28m 1x.xx.12.181 k8s-master1 <none> <none>
kube-system kube-scheduler-k8s-master1 0/1 CrashLoopBackOff 6 (4m27s ago) 30m 1x.xx.12.181 k8s-master1 <none> <none>

看到kube-controller-manager-k8s-master1和kube-scheduler-k8s-master1都是CrashLoopBackOff

查看原因

1
Liveness probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused

出现这种情况是kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0,在文件中注释掉就可以了。(每台master节点都要执行操作)

image-20220803173233845

1
2
vim /etc/kubernetes/manifests/kube-scheduler.yaml
vim /etc/kubernetes/manifests/kube-controller-manager.yaml

重启kubelet

1
systemctl restart kubelet.service

Error from server: etcdserver: request timed out

1
2
docker ps -a
docker logs -f etcd的容器id

image-20220804155119899

看到一个错误信息

1
leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk

查看服务器的io性能

image-20220804162104655

看起来不妙啊,再用sar看看

1
sar -d 1 10
1
2
3
4
平均时间:       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
平均时间: vda 12.80 0.00 248.00 19.38 2.28 177.77 35.69 45.68
平均时间: vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
平均时间: centos-root 24.70 0.00 252.00 10.20 2.29 92.52 40.52 100.09

await:平均每次设备 I/O 操作的等待时间

svctm:平均每次设备 I/O 操作的服务时间

%util:一秒中有百分之几的时间用于 I/O 操作

如果 svctm 的值与 await 很接近,表示几乎没有 I/O 等待,磁盘性能很好,如果 await 的值远高于 svctm 的值,则表示 I/O 队列等待太长,系统上运行的应用程序将变慢,此时可以通过更换更快的硬盘来解决问题。

1
iostat -dxk 1 10

通过iostat 也可看磁盘性能,现在发现磁盘性能确实不行。

我测了其他一台机器

1
2
3
平均时间:       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
平均时间: vda 51.10 0.00 554.40 10.85 0.03 0.65 0.70 3.56
平均时间: vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

差距极大。

这几台测试机器是openstack虚拟化的老旧服务器,磁盘是老旧的sata盘,也没做优化。这性能无法支持etcd集群。只能装单节点。

1
The connection to the server 1x.xx.12.181:6443 was refused - did you specify the right host or port?

检查各服务状态

1
2
3
systemctl status docker.service
systemctl status kubelet.service
systemctl status firewalld.service #关闭状态

正常

检查端口是否有监听

1
netstat -pnlt | grep 6443

无监听

查看kubelet日志

1
journalctl -xeu kubelet

image-20220804154816188

卸载

1
2
3
4
5
6
7
8
9
10
11
kubeadm reset -f
modprobe -r ipip
yum -y remove kubeadm* kubectl* kubelet* docker*
rm -rf ~/.kube/
rm -rf /etc/kubernetes/
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /etc/systemd/system/kubelet.service
rm -rf /usr/bin/kube*
rm -rf /var/lib/etcd
rm -rf /var/etcd