[LU-12684] MDT failed to mount during failover due to LNetError Created: 22/Aug/19 Updated: 22/Aug/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lustre-b2_12-ib #35 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
SOAK hit following error during MDS failover after has been running for 4 days During failover, MDT3 failed to mount on soak-10 due to network error syslog on soak-10 Aug 19 21:26:21 soak-10 kernel: LNetError: 12284:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 19 21:26:21 soak-10 kernel: LNetError: 12284:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 192.168.1.111@o2ib (9): c: 5, oc: 0, rc: 8 Aug 19 21:26:21 soak-10 kernel: Lustre: 12317:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566249976/re al 1566249981] req@ffff9678ecee8900 x1642318302104592/t0(0) o41->soaked-MDT0003-osp-MDT0002@192.168.1.111@o2ib:24/4 lens 224/368 e 0 to 1 dl 1566250020 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Aug 19 21:26:21 soak-10 kernel: Lustre: soaked-MDT0003-osp-MDT0002: Connection to soaked-MDT0003 (at 192.168.1.111@o2ib) was lost; in progress operations using t his service will wait for recovery to complete Aug 19 21:26:21 soak-10 kernel: Lustre: Skipped 3 previous similar messages Aug 19 21:26:21 soak-10 kernel: Lustre: 12317:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 19 21:26:24 soak-10 multipathd: 360080e50001fedb80000015952012962: sdi - rdac checker reports path is ghost console log on soak-10 [13420.972492] LNetError: 12284:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [13420.983876] LNetError: 12284:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 192.168.1.111@o2ib (9): c: 5, oc: 0, rc: 8 [13420.997683] Lustre: 12317:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566249976/real 1566249981] req@ffff9678ecee8900 x1642318302104592/t0(0) o41->soaked-MDT0003-osp-MDT0002@192.168.1.111@o2ib:24/4 lens 224/368 e 0 to 1 dl 1566250020 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 [13420.997711] Lustre: soaked-MDT0003-osp-MDT0002: Connection to soaked-MDT0003 (at 192.168.1.111@o2ib) was lost; in progress operations using this service will wait for recovery to complete [13420.997714] Lustre: Skipped 3 previous similar messages [13421.054508] Lustre: 12317:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message [13423.880342] device-mapper: multipath: Reinstating path 8:128. [13423.887079] device-mapper: multipath: Failing path 8:128. |