[LU-17378] obdfilter-survey test_3a: Timeout occurred Created: 19/Dec/23 Updated: 19/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/bfbdbdae-6c4b-4ef0-abb4-ba956983c322 test_3a failed with the following error: Timeout occurred after 576 minutes, last suite running was obdfilter-survey Test session details: <<Please provide additional information about the failure here>> MDS console [34451.919649] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == obdfilter-survey test 3a: Network survey ============== 23:18:40 \(1701299920\) [34452.111000] Lustre: DEBUG MARKER: == obdfilter-survey test 3a: Network survey ============== 23:18:40 (1701299920) [34453.201758] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true [34453.514311] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 [34457.041800] Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [34457.044694] Lustre: Skipped 30 previous similar messages [34457.045979] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [34457.047440] Lustre: Skipped 48 previous similar messages [34460.050142] LustreError: 1067372:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items [34460.051957] LustreError: 1067372:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 40 previous similar messages [34460.108661] Lustre: server umount lustre-MDT0000 complete ... [34461.559320] Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey [34461.882390] Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 [34462.161732] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [34462.164874] LustreError: Skipped 64 previous similar messages [34462.196821] Lustre: DEBUG MARKER: modprobe -r dm-flakey [34462.987368] LustreError: 1060464:0:(ldlm_lockd.c:2521:ldlm_cancel_handler()) ldlm_cancel from 10.240.39.170@tcp arrived at 1701299931 with bad export cookie 10225034501323090856 [34462.990356] LustreError: 1060464:0:(ldlm_lockd.c:2521:ldlm_cancel_handler()) Skipped 14 previous similar messages [34463.185819] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.39.170@tcp failed: rc = -107 [34463.187938] LustreError: Skipped 4 previous similar messages [34469.328952] Lustre: 12617:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1701299931/real 1701299931] req@0000000037606504 x1783906650521728/t0(0) o400->MGC10.240.40.14@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1701299938 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:3.0' [34469.334831] Lustre: 12617:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [34469.336851] LustreError: 166-1: MGC10.240.40.14@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail [34469.339375] LustreError: Skipped 8 previous similar messages VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |