[LU-17378] obdfilter-survey test_3a: Timeout occurred Created: 19/Dec/23  Updated: 19/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/bfbdbdae-6c4b-4ef0-abb4-ba956983c322

test_3a failed with the following error:

Timeout occurred after 576 minutes, last suite running was obdfilter-survey

Test session details:
clients: https://build.whamcloud.com/job/lustre-b2_15/77 - 4.18.0-477.15.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/77 - 4.18.0-477.15.1.el8_lustre.x86_64

<<Please provide additional information about the failure here>>

MDS console

[34451.919649] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == obdfilter-survey test 3a: Network survey ============== 23:18:40 \(1701299920\)
[34452.111000] Lustre: DEBUG MARKER: == obdfilter-survey test 3a: Network survey ============== 23:18:40 (1701299920)
[34453.201758] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[34453.514311] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
[34457.041800] Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
[34457.044694] Lustre: Skipped 30 previous similar messages
[34457.045979] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
[34457.047440] Lustre: Skipped 48 previous similar messages
[34460.050142] LustreError: 1067372:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
[34460.051957] LustreError: 1067372:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 40 previous similar messages
[34460.108661] Lustre: server umount lustre-MDT0000 complete
...
[34461.559320] Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey
[34461.882390] Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1
[34462.161732] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[34462.164874] LustreError: Skipped 64 previous similar messages
[34462.196821] Lustre: DEBUG MARKER: modprobe -r dm-flakey
[34462.987368] LustreError: 1060464:0:(ldlm_lockd.c:2521:ldlm_cancel_handler()) ldlm_cancel from 10.240.39.170@tcp arrived at 1701299931 with bad export cookie 10225034501323090856
[34462.990356] LustreError: 1060464:0:(ldlm_lockd.c:2521:ldlm_cancel_handler()) Skipped 14 previous similar messages
[34463.185819] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.39.170@tcp failed: rc = -107
[34463.187938] LustreError: Skipped 4 previous similar messages
[34469.328952] Lustre: 12617:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1701299931/real 1701299931]  req@0000000037606504 x1783906650521728/t0(0) o400->MGC10.240.40.14@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1701299938 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:3.0'
[34469.334831] Lustre: 12617:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 13 previous similar messages
[34469.336851] LustreError: 166-1: MGC10.240.40.14@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
[34469.339375] LustreError: Skipped 8 previous similar messages

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
obdfilter-survey test_3a - Timeout occurred after 576 minutes, last suite running was obdfilter-survey


Generated at Sat Feb 10 03:34:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.