Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
CentOS 7.9, Lustre 2.12.7
-
3
-
9223372036854775807
Description
After an abrupt power outtage we started to see an assertion causing a kernel panic on the MDT. From the attached vmcore-dmesg.txt we see the following specific errors:
[ 6867.143694] Lustre: 79890:0:(llog.c:615:llog_process_thread()) lustre01-OST0032-osc-MDT0000: invalid length 0 in llog [0x52ab:0x1:0x0]record for index 0/2
[ 6867.143705] Lustre: 79890:0:(llog.c:615:llog_process_thread()) Skipped 1 previous similar message
[ 6867.143720] LustreError: 79890:0:(osp_sync.c:1272:osp_sync_thread()) lustre01-OST0032-osc-MDT0000: llog process with osp_sync_process_queues failed: -22
[ 6867.148800] LustreError: 79890:0:(osp_sync.c:1272:osp_sync_thread()) Skipped 1 previous similar message
After a reboot, we can start the MDT and it finishes recovery. Bringing online all other OSTs things stay online. When the mentioned OST is mounted, though, we see the same kernel panic/assert.
I attempted to mount the MDT with ldiskfs and clear the updatelog and changelog but this made no difference.
Upgrading to lustre 2.12.9 helped resolve this issue. I'm unsure exactly which part handled the error better but this can now be closed.
Thank you.