Sealos 4.0.0
经验教训:
- /根目录需要大一些,镜像很大。300G
- 镜像很大,安装的时间准备的长一些
- 使用nohup,避免中途断网
如果磁盘不是根目录,要创建软连接
1 2 3 4 5 6
| mkdir -p /data/run/containerd mkdir -p /data/var/lib/containers mkdir -p /data/var/lib/kubelet ln -s /data/run/containerd /run/containerd ln -s /data/var/lib/containers /var/lib/containers ln -s /data/var/lib/kubelet /var/lib/kubelet
|
下载sealos命令行工具
提前安装一下jq工具
查看当前的sealos命令行工具版本
1
| curl --silent "https://api.github.com/repos/labring/sealos/releases" | jq -r '.[].tag_name'
|
选其中的稳定版
1 2 3 4 5 6 7 8 9 10
| v5.0.0-beta5 v5.0.0-beta4 v4.4.0-beta3 v5.0.0-beta3 v5.0.0-beta2 v5.0.0-beta1 v4.3.7 v5.0.0-alpha2 v4.3.7-rc1 v4.3.6
|
我这里用最新的稳定版4.3.7
下载
1
| wget https://github.com/labring/sealos/releases/download/v4.3.7/sealos_4.3.7_linux_amd64.tar.gz
|
安装
1
| tar zxvf sealos_4.3.7_linux_amd64.tar.gz sealos && chmod +x sealos && mv sealos /usr/bin
|
配置服务器环境
服务器做免密
在其中一台master上执行sealos命令,要配置这台master到其他所有服务器的免密登录
1 2 3 4 5 6
| ssh-keygen cd .ssh cat >> .ssh/authorized_keys << EOF master0的id_rsa.pub的内容 EOF firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address="10.xx.xx.194" service name='ssh' accept" firewall-cmd --reload echo "sshd: master节点的ip" >> /etc/hosts.allow
|
关闭selinux
关闭selinux以允许容器访问宿主机的文件系统
1 2 3
| # 查看,我的服务器交付时就已经关了, [root@my-paas-master0 ~]# getenforce Permissive
|
关闭方法:
1 2
| setenforce 0 # 临时关闭 sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config # 永久关闭,reboot生效
|
关闭swap
因为我们在oom时应该干脆的杀死应用,而不是用swap续命,引发级联故障。
1 2 3 4 5
| # swap total是0。我的服务器交付时默认关闭了swap [root@my-paas-master0 ~]# free -g total used free shared buff/cache available Mem: 7 0 6 0 0 7 Swap: 0 0 0
|
关闭方法:
1 2
| swapoff -a sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
|
打开NAT转发
如果没有开启包转发功能。那么系统只处理目的地为本机的数据包,不会转发发往其他地址的数据包。表现为如果使用nodeport的service,只有pod启动在这台服务器的服务器可以访问,其他服务器无法访问。
1 2 3 4 5
| firewall-cmd --add-masquerade --permanent
# 检查是否允许NAT转发 firewall-cmd --query-masquerade
|
防火墙端口
测试环境我都是直接关了防火墙,生产环境不合适
官方文档 Ports and Protocols | Kubernetes


k8s master需要开启以下端口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250-10259/tcp
//firewall-cmd --permanent --add-port=10259/tcp
//firewall-cmd --permanent --add-port=10257/tcp
# 如果要让master也可以当作nodeport的暴露ip使用 firewall-cmd --permanent --add-port=30000-32767/tcp
# 还有几个官方文档没写,但是根据经验和踩坑教训的,一起开了。多开几个也没事 //firewall-cmd --permanent --add-port=10251/tcp //firewall-cmd --permanent --add-port=10252/tcp //firewall-cmd --permanent --add-port=10255/tcp firewall-cmd --permanent --add-port=8472/udp firewall-cmd --permanent --add-port=443/tcp firewall-cmd --permanent --add-port=443/udp firewall-cmd --permanent --add-port=53/udp firewall-cmd --permanent --add-port=53/tcp firewall-cmd --permanent --add-port=9153/tcp
firewall-cmd --reload
|
k8s node需要开启以下端口
1 2 3 4 5
| firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=30000-32767/tcp
firewall-cmd --reload
|
如果你使用了istio还有把istio-pilot的端口加到防火墙里:
1
| firewall-cmd --permanent --add-port=15010-15014/tcp
|
配置linux最大线程数,最大文件数配置::
1 2 3 4 5
| echo "fs.file-max = 655350" >> /etc/sysctl.conf echo "kernel.pid_max = 655350" >> /etc/sysctl.conf
echo "root soft nofile 655350" >> /etc/security/limits.conf echo "root hard nofile 655350" >> /etc/security/limits.conf
|
安装socat
socat是一个网络工具, k8s 使用它来进行 pod 的数据交互
加载br_netfilter模块
配置了免密登录,不需要密码
1 2 3 4 5 6
| nohup sealos run \ --masters 10.xx.xx.194 \ --nodes 1x.xx.12.209,1x.xx.12.216 \ labring/kubernetes:v1.25.0 \ labring/helm:v3.8.2 \ labring/calico:v3.24.1 >sealos.log 2>&1 &
|
如果有报错,先清理再重新安装
起个nginx测试一下
1 2
| kubectl run ng --image=harbor-test.xxx.net/base/nginx:1.25.2 kubectl expose pod ng --port=80 --target-port=80 --type=NodePort
|

把防火墙打开
1
| systemctl start firewalld
|
还是可以的,没问题
安装kubesphere
安装nfs
查看nfs的端口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| # rpcinfo -p program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 59758 status 100024 1 tcp 33494 status 100005 1 udp 20048 mountd 100005 1 tcp 20048 mountd 100005 2 udp 20048 mountd 100005 2 tcp 20048 mountd 100005 3 udp 20048 mountd 100005 3 tcp 20048 mountd 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100227 3 tcp 2049 nfs_acl 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100227 3 udp 2049 nfs_acl 100021 1 udp 35373 nlockmgr 100021 3 udp 35373 nlockmgr 100021 4 udp 35373 nlockmgr 100021 1 tcp 22351 nlockmgr 100021 3 tcp 22351 nlockmgr 100021 4 tcp 22351 nlockmgr
|
nfs服务需要开启 mountd,nfs,nlockmgr,portmapper,rquotad这5个服务
nfs 和 portmapper两个服务是固定端口的,nfs为2049,portmapper为111。其他的3个服务是用的随机端口。
把这些端口全部加到防火墙里
1 2 3 4 5 6 7 8 9 10
| firewall-cmd --permanent --add-port=111/tcp firewall-cmd --permanent --add-port=111/udp firewall-cmd --permanent --add-port=2049/tcp firewall-cmd --permanent --add-port=2049/udp firewall-cmd --permanent --add-port=59758/udp firewall-cmd --permanent --add-port=33494/tcp firewall-cmd --permanent --add-port=20048/tcp firewall-cmd --permanent --add-port=20048/udp firewall-cmd --permanent --add-port=22351/tcp firewall-cmd --permanent --add-port=35373/udp
|
每台服务器都不一样,需要手动加特别麻烦
1
| for i in $(rpcinfo -p | awk 'NR>1 {print $4}' | sort -u);do firewall-cmd --permanent --add-port=$i/tcp; firewall-cmd --permanent --add-port=$i/udp;done
|
写了个脚本,自动加一下
1 2 3
| mkdir -p /data/nfsstorage mount -t nfs 10.xx.xx.194:/data/nfsstorage /data/nfsstorage showmount -e 10.xx.xx.194
|
kubesphere需要开的端口.(猜测,解决问题过程发现的,还需要验证)
1
| firewall-cmd --permanent --add-port=9115/tcp
|
calico需要的端口
查看官方文档,需要开通下列端口
System requirements | Calico Documentation (tigera.io)

1 2 3 4 5 6
| firewall-cmd --permanent --add-port=179/tcp firewall-cmd --permanent --add-port=4789/udp firewall-cmd --permanent --add-port=5473/tcp firewall-cmd --permanent --add-port=51820/udp firewall-cmd --permanent --add-port=51821/udp firewall-cmd --permanent --add-port=2379/tcp
|
但是好像不准,我用nmap -v -A localhost在其他机器上看到还用了9999端口
1
| firewall-cmd --permanent --add-port=9999/tcp
|
问题
发现还需要一个5443端口
1
| firewall-cmd --permanent --add-port=5443/tcp
|
kubesphere需要的端口
其他的前面已经开通了,追加一下几个
1 2 3
| firewall-cmd --permanent --add-port=9090/tcp firewall-cmd --permanent --add-port=9099-9100/tcp firewall-cmd --permanent --add-port=8443/tcp
|
不行,还是有问题。目前只能还是关闭防火墙
kubesphere 默认密码P@88w0rd
改成 xxxPaasNo1
启用插件
clusterconfiguration

nerdctl
1 2 3 4 5 6 7 8
| wget http://1x.xx.66.1/4zLKJ/nerdctl-full-0.22.2-linux-amd64.tar.gz
mkdir nerdctl tar -zxvf nerdctl-full-0.22.2-linux-amd64.tar.gz -C nerdctl cp nerdctl/lib/systemd/system/*.service /etc/systemd/system/
systemctl enable buildkit containerd systemctl start buildkit containerd
|
mkdir -p /usr/local/containerd/bin && tar -zxvf nerdctl-full-0.22.2-linux-amd64.tar.gz nerdctl && mv nerdctl /usr/local/containerd/bin
wget http://transfer.paas.xxx.net/1usiGhY/buildkit-v0.10.3.linux-amd64.tar.gz
1 2 3
| tar -zxvf buildkit-v0.10.3.linux-amd64.tar.gz -C /usr/local/containerd/ ln -s /usr/local/containerd/bin/buildkitd /usr/local/bin/buildkitd ln -s /usr/local/containerd/bin/buildctl /usr/local/bin/buildctl
|
创建systemd服务相关文件 /etc/systemd/system/buildkit.socket
1 2 3 4 5 6 7 8 9
| [Unit] Description=BuildKit Documentation=https://github.com/moby/buildkit
[Socket] ListenStream=%t/buildkit/buildkitd.sock
[Install] WantedBy=sockets.target
|
/etc/systemd/system/buildkit.service
1 2 3 4 5 6 7 8 9 10
| [Unit] Description=BuildKit Requires=buildkit.socket After=buildkit.socketDocumentation=https://github.com/moby/buildkit
[Service] ExecStart=/usr/local/bin/buildkitd --oci-worker=false --containerd-worker=true
[Install] WantedBy=multi-user.target
|
启动buildkitd
1 2 3
| systemctl daemon-reload systemctl enable buildkit systemctl start buildkit
|
外网互通场景安装
需要路由器配置流量转发
1 2
| yum install traceroute.x86_64 -y yum install net-tools -y
|
生成配置文件
1 2 3 4 5 6
| sealos gen labring/kubernetes:v1.22.11 \ labring/calico:v3.22.1 \ labring/openebs:v1.9.0 \ registry.cn-shenzhen.aliyuncs.com/cnmirror/kubesphere:v3.3.0 \ --masters 1x.xx.232.237,1x.xx.232.238,1x.xx.232.239 \ --nodes 1x.xx.232.233,1x.xx.232.234,1x.xx.232.236 > Clusterfile
|
修改配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
| apiVersion: apps.sealos.io/v1beta1 kind: Cluster metadata: creationTimestamp: null name: default spec: hosts: - ips: - 1x.xxx.5.6:22 roles: - master - amd64 - ips: - 1x.xxx.5.7:22 - 1x.xxx.5.8:22 - 1x.xxx.5.9:22 - 1x.xxx.5.10:22 roles: - node - amd64 image: - labring/kubernetes:v1.22.11 - labring/calico:v3.22.1 - labring/openebs:v1.9.0 - registry.cn-shenzhen.aliyuncs.com/cnmirror/kubesphere:v3.3.0 ssh: pk: /root/.ssh/id_rsa port: 22 user: root status: {} --- apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration networking: podSubnet: 1x.xxx.36.0/22 --- apiVersion: apps.sealos.io/v1beta1 kind: Config metadata: name: calico spec: path: manifests/calico.yaml data: | apiVersion: operator.tigera.io/v1 kind: Installation metadata: name: default spec: # Configures Calico networking. calicoNetwork: bgp: Enabled # Note: The ipPools section cannot be modified post-install. ipPools: - blockSize: 26 # Note: Must be the same as podCIDR cidr: 1x.xxx.36.0/22 encapsulation: None natOutgoing: Enabled nodeSelector: all() nodeAddressAutodetectionV4: interface: "eth.*|en.*"
|
1
| 1x.xxx.40.0/21 给 生产dubbo 网关 1x.xxx.47.254 子网掩码255.255.248.0
|
1
| 1x.xxx.36.0/22 给 测试dubbo 网关 1x.xxx.39.254
|
部署
1
| nohup sealos apply -f Clusterfile >sealos.log 2>&1 &
|
prometheus开启失败
查看日志MountVolume.SetUp failed for volume “secret-kube-etcd-client-certs” : secret “kube-etcd-client-certs” not found
解决:
1
| kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs
|
ldap
1
| kubectl -n kubesphere-system edit cc ks-installer
|
配置authentication字段
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| authentication: jwtSecret: '' maximumClockSkew: 10s multipleLogin: true oauthOptions: accessTokenMaxAge: 0 accessTokenInactivityTimeout: 30m identityProviders: - name: ldap type: LDAPIdentityProvider mappingMethod: auto provider: host: 1x.xxx.7.142:389 managerDN: cn=hopuser,o=services managerPassword: hopuser@2014 userSearchBase: o=xxx loginAttribute: cn mailAttribute: mail
|
查看ks-install的日志
1
| kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
|
结束之后
重启下api-server组件
1
| kubectl -n kubesphere-system rollout restart deploy/ks-apiserver
|
如果还不行,openldap, controller-manager(不确定怎么好的)
1 2
| kubectl -n kubesphere-system delete pod openldap-0 kubectl -n kubesphere-system rollout restart deploy/ks-controller-manager
|
修改logo
文件在1x.xxx.2.103上,/root/ks 通过7.21上去
http://transfer.paas.xxx.net/1QJ9T6W/config.yaml
Dockerfile
1 2 3 4 5 6
| FROM harbor-test.xxx.net/kubesphere/ks-console:v3.3.0 COPY ./logo.svg /opt/kubesphere/console/dist/assets/ COPY ./login-logo.svg /opt/kubesphere/console/dist/assets/ COPY ./favicon.ico /opt/kubesphere/console/dist/assets/ COPY ./locale-zh.03f0fb248751b0a3bd2d.json /opt/kubesphere/console/dist/ COPY ./config.yaml /opt/kubesphere/console/server/
|
1 2 3
| docker build -t kubesphere/ks-console:v3.3.rrs . docker tag kubesphere/ks-console:v3.3.rrs harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs docker push harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs
|
不需要了
1
| nerdctl pull harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs
|
然后修改ks-console的镜像


podman pull问题
1
| * Error initializing source docker://registry.fedoraproject.org/java:openjdk-8-jre-alpine: Error reading manifest openjdk-8-jre-alpine in registry.fedoraproject.org/java: manifest unknown: manifest unknown
|
集群纳管
在主集群中执行以下命令来获取jwtSecret
1
| kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret
|
vakPJMEze4ws8mHgCq2jlvpVD3piOBhp
被纳管集群执行
1
| kubectl edit cc ks-installer -n kubesphere-system
|
在 ks-installer 的 YAML 文件中对应输入上面所示的 jwtSecret:
1 2
| authentication: jwtSecret: vakPJMEze4ws8mHgCq2jlvpVD3piOBhp
|
向下滚动并将 clusterRole 的值设置为 member
1 2
| multicluster: clusterRole: member
|
执行命令,查看进度
1
| kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
|
添加集群
获取被纳管集群的kubeconfig
去主机群的页面,集群管理页面点击添加集群。
问题:token not found in cache
原因其实是token过期了
ks-instaler里的auth配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| authentication: jwtSecret: vakPJMEze4ws8mHgCq2jlvpVD3piOBhp maximumClockSkew: 10s multipleLogin: true oauthOptions: accessTokenInactivityTimeout: 30m accessTokenMaxAge: 0 identityProviders: - mappingMethod: auto name: ldap provider: host: 1x.xxx.7.142:389 loginAttribute: cn mailAttribute: mail managerDN: cn=hopuser,o=services managerPassword: hopuser@2014 userSearchBase: o=xxx type: LDAPIdentityProvider
|
把这个accessTokenMaxAge设置成0,表示永不过期
之前设置成了10h,但是如果搭建的ks超过了10小时,再纳管就会过期。