接上篇

openEuler 20.03 (LTS-SP2) aarch64 cephadm 部署ceph18.2.0【1】离线部署 准备基础环境-CSDN博客

openEuler 20.03 (LTS-SP2) aarch64 cephadm 部署ceph18.2.0【2】离线部署 podman配置registries 部署registry私服 准备离线镜像-CSDN博客

openEuler 20.03 (LTS-SP2) aarch64 cephadm 部署ceph18.2.0【3】bootstrap 解决firewalld防火墙导致的故障-CSDN博客

当前状态

ceph orch host ls

重新bootstrap

由于之前10-2.1.176主机名未设置,执行了bootstrap。为了保持主机名一致性,删除所有数据,重新bootstrap(应注意第一次bootstrap前确认采用hostnamectl设置了正确的主机名)

删除集群

podman ps | grep -v CONTAINER | awk '{print $1}' | xargs podman rm -f
rm /etc/ceph/ -rf
rm /var/lib/ceph -rf
rm /var/log/ceph/ -rf
 
systemctl restart podman

搭建registry私服

podman load -i podman-images/registry-2.tar 
# 删除旧数据
rm -rf /var/lib/registry
mkdir -p /var/lib/registry
podman run --privileged -d --name registry -p 5000:5000 -v /var/lib/registry:/var/lib/registry --restart=always registry:2

推送镜像

podman push 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0
podman push 10.2.1.176:5000/quay.io/prometheus/prometheus:v2.43.0
podman push 10.2.1.176:5000/docker.io/grafana/loki:2.4.0
podman push 10.2.1.176:5000/docker.io/grafana/promtail:2.4.0
podman push 10.2.1.176:5000/quay.io/prometheus/node-exporter:v1.5.0
podman push 10.2.1.176:5000/quay.io/prometheus/alertmanager:v0.25.0
podman push 10.2.1.176:5000/quay.io/ceph/ceph-grafana:9.4.7
podman push 10.2.1.176:5000/quay.io/ceph/haproxy:2.3
podman push 10.2.1.176:5000/quay.io/ceph/keepalived:2.2.4
podman push 10.2.1.176:5000/docker.io/maxwo/snmp-notifier:v1.2.1
podman push 10.2.1.176:5000/quay.io/omrizeneva/elasticsearch:6.8.23
podman push 10.2.1.176:5000/quay.io/jaegertracing/jaeger-collector:1.29
podman push 10.2.1.176:5000/quay.io/jaegertracing/jaeger-agent:1.29
podman push 10.2.1.176:5000/quay.io/jaegertracing/jaeger-query:1.29

bootstrap

[root@ceph-176 ~]# 
[root@ceph-176 ~]# cephadm --image 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0 bootstrap --registry-url=10.2.1.176:5000 --registry-username=x --registry-password=x --mon-ip 10.2.1.176 --log-to-file
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit ntpd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 3.4.4 is present
systemctl is present
lvcreate is present
Unit ntpd.service is enabled and running
Host looks OK
Cluster fsid: 75dc1df2-97e8-11ee-8a99-faa4b605ed00
Verifying IP 10.2.1.176 port 3300 ...
Verifying IP 10.2.1.176 port 6789 ...
Mon IP `10.2.1.176` is in CIDR network `10.2.1.0/24`
Mon IP `10.2.1.176` is in CIDR network `10.2.1.0/24`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Logging into custom registry.
Pulling container image 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0...
Ceph version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
firewalld ready
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 10.2.1.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Verifying port 8765 ...
Verifying port 8443 ...
firewalld ready
firewalld ready
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host ceph-176...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying ceph-exporter service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...

Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
firewalld ready
Ceph Dashboard is now available at:

         URL: https://ceph-176:8443/
        User: admin
    Password: wont7z1rg0

Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/75dc1df2-97e8-11ee-8a99-faa4b605ed00/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:

    sudo /usr/local/sbin/cephadm shell --fsid 75dc1df2-97e8-11ee-8a99-faa4b605ed00 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Or, if you are only running a single cluster on this host:

    sudo /usr/local/sbin/cephadm shell 

Please consider enabling telemetry to help improve Ceph:

    ceph telemetry on

For more information see:

    https://docs.ceph.com/en/latest/mgr/telemetry/

Bootstrap complete.
 

折腾这么猛,就为改个名字。。。

设置容器镜像均采用内网服务器地址

ceph config set mgr mgr/cephadm/container_image_alertmanager 10.2.1.176:5000/quay.io/prometheus/alertmanager:v0.25.0
ceph config set mgr mgr/cephadm/container_image_base 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0
ceph config set mgr mgr/cephadm/container_image_elasticsearch 10.2.1.176:5000/quay.io/omrizeneva/elasticsearch:6.8.23
ceph config set mgr mgr/cephadm/container_image_grafana 10.2.1.176:5000/quay.io/ceph/ceph-grafana:9.4.7
ceph config set mgr mgr/cephadm/container_image_haproxy 10.2.1.176:5000/quay.io/ceph/haproxy:2.3
ceph config set mgr mgr/cephadm/container_image_jaeger_agent 10.2.1.176:5000/quay.io/jaegertracing/jaeger-agent:1.29
ceph config set mgr mgr/cephadm/container_image_jaeger_collector 10.2.1.176:5000/quay.io/jaegertracing/jaeger-collector:1.29
ceph config set mgr mgr/cephadm/container_image_jaeger_query 10.2.1.176:5000/quay.io/jaegertracing/jaeger-query:1.29
ceph config set mgr mgr/cephadm/container_image_keepalived 10.2.1.176:5000/quay.io/ceph/keepalived:2.2.4
ceph config set mgr mgr/cephadm/container_image_loki 10.2.1.176:5000/docker.io/grafana/loki:2.4.0
ceph config set mgr mgr/cephadm/container_image_node_exporter 10.2.1.176:5000/quay.io/prometheus/node-exporter:v1.5.0
ceph config set mgr mgr/cephadm/container_image_prometheus 10.2.1.176:5000/quay.io/prometheus/prometheus:v2.43.0
ceph config set mgr mgr/cephadm/container_image_promtail 10.2.1.176:5000/docker.io/grafana/promtail:2.4.0
ceph orch redeploy prometheus
ceph orch redeploy grafana
ceph orch redeploy alertmanager
ceph orch redeploy node-exporter

似乎这个时候不需要重新部署

[ceph: root@ceph-176 /]# ceph orch redeploy prometheusError EINVAL: No daemons exist under service name "prometheus". View currently running services using "ceph orch ls"
[ceph: root@ceph-176 /]# ceph orch redeploy grafana
Error EINVAL: No daemons exist under service name "grafana". View currently running services using "ceph orch ls"
[ceph: root@ceph-176 /]# ceph orch redeploy alertmanager
Error EINVAL: No daemons exist under service name "alertmanager". View currently running services using "ceph orch ls"
[ceph: root@ceph-176 /]# ceph orch redeploy node-exporter
Error EINVAL: No daemons exist under service name "node-exporter". View currently running services using "ceph orch ls"


启动非常慢!

添加host

查看当前情况

[ceph: root@ceph-176 /]# ceph orch host ls
HOST      ADDR        LABELS  STATUS  
ceph-176  10.2.1.176  _admin          
1 hosts in cluster


配置免密登录容器内)

ceph cephadm get-pub-key > ~/ceph.pub
ssh-copy-id -f -i ~/ceph.pub root@ceph-191
ssh-copy-id -f -i ~/ceph.pub root@ceph-219

添加host

ceph orch host add ceph-191 10.2.1.191 --labels _main
ceph orch host add ceph-219 10.2.1.219 --labels _main

ceph orch host ls

添加mon

重启registry!(非常重要)

目测测试看,registry启动后,导入镜像,需要重新容器,否则防火墙规则不生效,导致其他节点podman login报错!

podman restart registry

程序直接卡主!重启时报错

故障:再次添加mon(unknown: manifest unknown)

ceph orch daemon add mon ceph-191:10.2.1.191

查看registry中存储的ceph sha256

对比镜像信息,发现对不上(不明白)

podman inspect 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0

设置container_image(无效!)

尝试修改 container_image参数,不适用sha256,使用镜像标签

ceph config set global container_image 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0

设置无效!

只能研究为什么push到私有仓库sha256发生变化了

curl 10.2.1.176:5000/v2/quay.io/ceph/ceph/manifests/v18.2.0 2>&1 | grep sha256

文心一言的回答(似乎在之前x86_64上并没有出现此问题)

搭建nfs,共享镜像过来,手动导入

服务端nfs

yum install nfs-utils

配置/etc/exports(由于在/root目录下,必须配置no_root_squash,否则客户端无权限访问,挂载失败)

/root/podman-images 10.2.1.0/24(rw,sync,no_root_squash,no_subtree_check)

重启nfs

systemctl restart nfs

客户端191挂载

mkdir /mnt/nfsmount
mount.nfs 10.2.1.176:/root/podman-images /mnt/nfsmount/

导入镜像,手动tag

[root@ceph-191 nfsmount]# podman rmi 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0
Untagged: 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0
Deleted: efeefa2a9c1238e831b9ee459aaca5772502a17bc0e9a13b653d4c56cf38ac13    
[root@ceph-191 nfsmount]# podman load -i ceph-v18.2.0.tar 
Getting image source signatures
Copying blob 687806ababb3 done  
Copying blob 148057ff0bd2 done  
Copying config efeefa2a9c done  
Writing manifest to image destination
Storing signatures
Loaded image(s): quay.io/ceph/ceph:v18.2.0
[root@ceph-191 nfsmount]# docker tag quay.io/ceph/ceph:v18.2.0 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0

测试sha256(失败!)

[root@ceph-191 nfsmount]# podman run --rm -it 10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4 /bin/true
Trying to pull 10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4...
Error: initializing source docker://10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4: reading manifest sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4 in 10.2.1.176:5000/quay.io/ceph/ceph: manifest unknown: manifest unknown

再次设置container_image(--force成功!)

ceph config set global container_image 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0 --force

191 mon 自动恢复,一段时间后,该值又自动还原!(分析过程记录)

应该是 10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0 待续。。。。。!!!!

[ceph: root@ceph-176 /]# ceph orch daemon add mon ceph-191:10.2.1.191
卡主了

。。。

神奇的事,191故障,219节点居然能运行,sha256值不对的理解不正确!

一个晚上过去了,container_image又被重置了

219测试页不能pull sha256指定的镜像

猜测:

10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4

应该是基于本地的

10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0

镜像,计算来的。

191 podman images

219 podman images

其中10.2.1.176:5000/quay.io/ceph/ceph:v18.2.0 image id 一致。

191 sha256 pull公网资源(成功)

[root@ceph-191 ~]# podman run --rm -it quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4 /bin/true
Trying to pull quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4...
Getting image source signatures
Copying blob 88620ff853b6 done  
Copying blob 5487466c90da done  
Copying config efeefa2a9c done  
Writing manifest to image destination
Storing signatures

成功运行,没有报错!

[root@ceph-191 ~]# podman run --rm -it 10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4 /bin/true
[root@ceph-191 ~]# 

此时,10.2.1.176:5000上的镜像也不报错了!

但是,podman images没有任何变化!(这是podman aarch64下的bug?

[root@ceph-191 ~]# podman tag quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4 10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4
Error: 10.2.1.176:5000/quay.io/ceph/ceph@sha256:7919cf002430693fe20a83cc501a3cf5ec3ffe3cbb0decdf077e1955a1ceb3b4: tag by digest not supported

[root@ceph-191 ~]# podman images -a
REPOSITORY                                        TAG         IMAGE ID      CREATED        SIZE
quay.io/ceph/ceph                                 v18.2.0     efeefa2a9c12  12 days ago    1.25 GB
10.2.1.176:5000/quay.io/ceph/ceph                 v18.2.0     efeefa2a9c12  12 days ago    1.25 GB
10.2.1.176:5000/quay.io/ceph/ceph-grafana         9.4.7       dc8d789752ef  8 months ago   676 MB
10.2.1.176:5000/quay.io/prometheus/node-exporter  v1.5.0      68cb0c05b3f2  12 months ago  23.2 MB
[root@ceph-191 ~]# 

manifest unknown: manifest unknown总结:公网pull一次(podman bug?)

整个集群正常了!

再次添加mon(成功!)

[ceph: root@ceph-176 /]# ceph orch daemon add mon ceph-191:10.2.1.191
Deployed mon.ceph-191 on host 'ceph-191'
[ceph: root@ceph-176 /]# ceph orch daemon add mon ceph-219:10.2.1.219
Deployed mon.ceph-219 on host 'ceph-219'

最终状态

176 podman 容器状态

191 podman容器状态

219容器状态

Logo

鲲鹏展翅 立根铸魂 深耕行业数字化

更多推荐