[LU-13323] sanity test_160a: timeout Created: 04/Mar/20 Updated: 04/Mar/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for S Buisson <sbuisson@ddn.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f0d9c514-dca5-4f09-8180-068456c0d0c8 test_160a failed with the following error: Timeout occurred after 160 mins, last suite running was sanity After MDT failover, it seems clients never reconnect. [ 5987.459946] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 [ 5987.641616] Lustre: Failing over lustre-MDT0000 [ 5988.179879] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.43@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 5988.182823] LustreError: Skipped 6 previous similar messages [ 5993.848997] Lustre: 25449:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256866/real 1583256866] req@ffff939a9e2c4900 x1660159018546432/t0(0) o251->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256872 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'umount.0' [ 5994.048821] Lustre: server umount lustre-MDT0000 complete [ 5994.292111] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && [ 5994.292111] lctl dl | grep ' ST ' || true [ 5994.650156] Lustre: DEBUG MARKER: modprobe dm-flakey; [ 5994.650156] dmsetup targets | grep -q flakey [ 5995.000133] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1 [ 5995.350091] Lustre: DEBUG MARKER: modprobe dm-flakey; [ 5995.350091] dmsetup targets | grep -q flakey [ 5995.696047] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 [ 5996.039360] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 [ 5996.384483] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey [ 5996.727194] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey [ 5997.067827] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -olocalrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 [ 5997.294752] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [ 5997.326213] Lustre: osd-ldiskfs create tunables for lustre-MDT0000 [ 5997.721570] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 5997.773182] Lustre: lustre-MDD0000: changelog on [ 5997.774238] Lustre: 26177:0:(mdd_device.c:545:mdd_changelog_llog_init()) lustre-MDD0000 : orphan changelog records found, starting from index 0 to index 16, being cleared now [ 5997.778242] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect On client1: [ 6075.180830] Lustre: lustre-MDT0000-mdc-ffff9513656af800: Connection to lustre-MDT0000 (at 10.9.6.10@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 6082.181294] Lustre: 19983:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256870/real 1583256870] req@ffff951364fd4480 x1660164743953984/t0(0) o400->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256877 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/0:1.0' [ 6082.186243] LustreError: 166-1: MGC10.9.6.10@tcp: Connection to MGS (at 10.9.6.10@tcp) was lost; in progress operations using this service will fail On client2: [ 6072.132648] Lustre: lustre-MDT0000-mdc-ffff92943a687000: Connection to lustre-MDT0000 (at 10.9.6.10@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 6079.133314] Lustre: 11180:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256867/real 1583256867] req@ffff929423be0900 x1660159108778048/t0(0) o400->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256874 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/1:2.0' [ 6079.136248] LustreError: 166-1: MGC10.9.6.10@tcp: Connection to MGS (at 10.9.6.10@tcp) was lost; in progress operations using this service will fail VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |