Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0
-
3
-
9223372036854775807
Description
soak triggered mds_failover testing. According to the soak.log, MDT0003 from the failing MDS(soak-11) should be mounted on failover pair soak-10, but it didn't.
soak.log
2019-11-19 01:22:13,931:fsmgmt.fsmgmt:INFO trying to connect to soak-11 ... 2019-11-19 01:22:20,107:fsmgmt.fsmgmt:INFO trying to connect to soak-11 ... 2019-11-19 01:22:25,285:fsmgmt.fsmgmt:INFO trying to connect to soak-11 ... 2019-11-19 01:22:26,296:fsmgmt.fsmgmt:INFO soak-11 is up!!! 2019-11-19 01:22:37,308:fsmgmt.fsmgmt:INFO Failing over soaked-MDT0003 ... 2019-11-19 01:22:37,308:fsmgmt.fsmgmt:INFO Mounting soaked-MDT0003 on soak-10 ...
Here is the console log on soak-10 around that time
[17741.278456] device-mapper: multipath: Failing path 8:128. [17746.279544] device-mapper: multipath: Reinstating path 8:128. [17746.286032] device-mapper: multipath: Failing path 8:128. [17747.871994] LNetError: 6527:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [17747.883281] LNetError: 6527:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 192.168.1.111@o2ib (10): c: 7, oc: 0, rc: 8 [17747.897005] LNetError: 6533:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 192.168.1.111@o2ib added to recovery queue. Health = 900 [17747.911953] LNetError: 20538:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 192.168.1.110@o2ib added to recovery queue. Health = 900 [17747.925462] LNetError: 20538:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 5 previous similar messages [17747.937096] Lustre: 6550:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1574126428/real 1574126433] req@ffff899fea070000 x1650580018692480/t0(0) o41->soaked-MDT0003-osp-MDT0002@192.168.1.111@o2ib:24/4 lens 224/368 e 0 to 1 dl 1574126435 ref 1 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'' [17747.970231] Lustre: 6550:0:(client.c:2219:ptlrpc_expire_one_request()) Skipped 1 previous similar message [17747.980974] Lustre: soaked-MDT0003-osp-MDT0002: Connection to soaked-MDT0003 (at 192.168.1.111@o2ib) was lost; in progress operations using this service will wait for recovery to complete [17751.292982] device-mapper: multipath: Reinstating path 8:128. [17751.299695] device-mapper: multipath: Failing path 8:128. [17756.300695] device-mapper: multipath: Reinstating path 8:128. [17756.307377] device-mapper: multipath: Failing path 8:128.
Attachments
Issue Links
- mentioned in
-
Page Loading...