Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9c32c9af-c574-4023-9286-07091f92769c
test_29 failed with the following error:
Started lustre-MDT0001 Timeout occurred after 135 mins, last suite running was replay-dual
It looks like the MDS is having trouble reading the recovery llog and is stuck doing this forever with "retry remote llog process":
[Mon Dec 27 22:49:21 2021] LustreError: 113045:0:(llog.c:472:llog_verify_record()) lustre-MDT0000-osp-MDT0001: record is too large: 0 > 32768 [Mon Dec 27 22:49:21 2021] LustreError: 113045:0:(llog.c:656:llog_process_thread()) lustre-MDT0000-osp-MDT0001: invalid record in llog [0x2:0x11d41:0x2] record for index 0/2: rc = -22 [Mon Dec 27 22:49:21 2021] LustreError: 113045:0:(llog.c:482:llog_verify_record()) lustre-MDT0000-osp-MDT0001: magic 0 is bad [Mon Dec 27 22:49:21 2021] LustreError: 113045:0:(llog.c:781:llog_process_thread()) lustre-MDT0000-osp-MDT0001 retry remote llog process [Mon Dec 27 22:49:22 2021] Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect [Mon Dec 27 22:49:22 2021] LustreError: 113045:0:(llog.c:472:llog_verify_record()) lustre-MDT0000-osp-MDT0001: record is too large: 400547 > 32768 [Mon Dec 27 22:49:22 2021] LustreError: 113045:0:(llog.c:472:llog_verify_record()) Skipped 205 previous similar messages [Mon Dec 27 22:49:22 2021] LustreError: 113045:0:(llog.c:656:llog_process_thread()) lustre-MDT0000-osp-MDT0001: invalid record in llog [0x2:0x11d41:0x2] record for index 96/0: rc = -22 [Mon Dec 27 22:49:22 2021] LustreError: 113045:0:(llog.c:656:llog_process_thread()) Skipped 309 previous similar messages : : [Mon Dec 27 23:36:25 2021] LustreError: 113045:0:(llog.c:482:llog_verify_record()) lustre-MDT0000-osp-MDT0001: magic 0 is bad [Mon Dec 27 23:36:25 2021] LustreError: 113045:0:(llog.c:482:llog_verify_record()) Skipped 129784 previous similar messages [Mon Dec 27 23:36:25 2021] LustreError: 113045:0:(llog.c:781:llog_process_thread()) lustre-MDT0000-osp-MDT0001 retry remote llog process [Mon Dec 27 23:36:25 2021] LustreError: 113045:0:(llog.c:781:llog_process_thread()) Skipped 32445 previous similar messages [Mon Dec 27 23:36:29 2021] Lustre: 113052:0:(ldlm_lib.c:1962:extend_recovery_timer()) lustre-MDT0001: extended recovery timer reached hard limit: 180, extend: 1 [Mon Dec 27 23:36:29 2021] Lustre: 113052:0:(ldlm_lib.c:1962:extend_recovery_timer()) Skipped 29 previous similar messages [Mon Dec 27 23:46:25 2021] LustreError: 113045:0:(llog.c:472:llog_verify_record()) lustre-MDT0000-osp-MDT0001: record is too large: 0 > 32768 [Mon Dec 27 23:46:25 2021] LustreError: 113045:0:(llog.c:472:llog_verify_record()) Skipped 258999 previous similar messages [Mon Dec 27 23:46:25 2021] LustreError: 113045:0:(llog.c:656:llog_process_thread()) lustre-MDT0000-osp-MDT0001: invalid record in llog [0x2:0x11d41:0x2] record for index 0/0: rc = -22 [Mon Dec 27 23:46:25 2021] LustreError: 113045:0:(llog.c:656:llog_process_thread()) Skipped 388499 previous similar messages
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_29 - Timeout occurred after 135 mins, last suite running was replay-dual