[LU-16287] replay-single: test_102d timeout Created: 02/Nov/22 Updated: 03/Nov/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/aec823a8-3961-47ad-b2b5-d8a97f1a242f [Wed Nov 2 05:35:20 2022] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 102d: check replay & reconstruction with multiple mod RPCs in flight ========================================================== 05:35:36 \(1667367336\) [Wed Nov 2 05:35:21 2022] Lustre: DEBUG MARKER: == replay-single test 102d: check replay [Wed Nov 2 05:35:21 2022] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x15a [Wed Nov 2 05:35:21 2022] Lustre: *** cfs_fail_loc=15a, val=0*** [Wed Nov 2 05:35:21 2022] Lustre: Skipped 5 previous similar messages [Wed Nov 2 05:35:23 2022] Lustre: DEBUG MARKER: sync; sync; sync [Wed Nov 2 05:35:24 2022] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0 [Wed Nov 2 05:35:24 2022] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true [Wed Nov 2 05:35:25 2022] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 [Wed Nov 2 05:35:25 2022] Lustre: lustre-MDT0003: Not available for connect from 10.240.29.171@tcp (stopping) [Wed Nov 2 05:35:25 2022] Lustre: Skipped 7 previous similar messages [Wed Nov 2 05:35:29 2022] Lustre: lustre-MDT0003: Not available for connect from 10.240.29.168@tcp (stopping) [Wed Nov 2 05:35:29 2022] Lustre: Skipped 10 previous similar messages [Wed Nov 2 05:35:34 2022] Lustre: lustre-MDT0003: Not available for connect from 10.240.29.168@tcp (stopping) [Wed Nov 2 05:35:34 2022] Lustre: Skipped 12 previous similar messages [Wed Nov 2 05:35:42 2022] Lustre: lustre-MDT0003: Not available for connect from 0@lo (stopping) [Wed Nov 2 05:35:42 2022] Lustre: Skipped 24 previous similar messages [Wed Nov 2 05:35:45 2022] LustreError: 190662:0:(import.c:355:ptlrpc_invalidate_import()) lustre-MDT0001_UUID: timeout waiting for callback (1 != 0) [Wed Nov 2 05:35:45 2022] LustreError: 190662:0:(import.c:383:ptlrpc_invalidate_import()) @@@ still on delayed list req@00000000e4286085 x1748352857944832/t0(0) o41->lustre-MDT0001-osp-MDT0003@0@lo:24/4 lens 224/224 e 0 to 0 dl 1670132019 ref 1 fl Rpc:RESQU/0/0 rc -5/-107 job:'osp-pre-1-3.0' [Wed Nov 2 05:35:45 2022] LustreError: 190662:0:(import.c:389:ptlrpc_invalidate_import()) lustre-MDT0001_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting for them to error out. The log shows umount /mnt/lustre-mdt4 is stuck in ptlrpc_invalidate_import(), and the remaining request is statfs between MDTs. |
| Comments |
| Comment by Andreas Dilger [ 03/Nov/22 ] |
|
I think this failure is caused by the patch https://review.whamcloud.com/48584 "LU-16159 lod: cancel update llogs upon recovery abort" being tested. In addition to 2x failures in replay-single for autotest, there were 20x failures in the same replay-single subtest from Gerrit Janitor: |
| Comment by Lai Siyao [ 03/Nov/22 ] |
|
Indeed, but the Janitor failure is an exception: replay-single 100c should be skipped in 2.15.52 (which is seen in autotest), but not by Janitor. |