Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.4.0
-
3
-
7623
Description
When starting lustre on Sequoia's MDS/MGS, it is hitting the following assertion:
2013-04-09 16:46:16 Lustre: lsv-MDT0000: Will be in recovery for at least 5:00, or until 2 clients reconnect. 2013-04-09 16:46:19 Lustre: lsv-MDT0000: Recovery over after 0:03, of 2 clients 2 recovered and 0 were evicted. 2013-04-09 16:46:58 LustreError: 11-0: lsv-OST000c-osc-MDT0000: Communicating with 172.20.20.12@o2ib500, operation ost_connect failed with -16. 2013-04-09 16:47:38 LustreError: 11-0: lsv-OST000b-osc-MDT0000: Communicating with 172.20.20.11@o2ib500, operation ost_connect failed with -16. 2013-04-09 16:47:38 LustreError: Skipped 9 previous similar messages 2013-04-09 16:48:03 LustreError: 11-0: lsv-OST0007-osc-MDT0000: Communicating with 172.20.20.7@o2ib500, operation ost_connect failed with -16. 2013-04-09 16:48:03 LustreError: Skipped 9 previous similar messages 2013-04-09 16:48:24 Lustre: lsv-OST0001-osc-MDT0000: Connection restored to lsv-OST0001 (at 172.20.20.1@o2ib500) 2013-04-09 16:48:24 Lustre: lsv-OST0003-osc-MDT0000: Connection restored to lsv-OST0003 (at 172.20.20.3@o2ib500) 2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100090000:0x4c00:0x0] pre used fid [0x100090000:0x16bec0:0x0] 2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) LBUG
This is an x86_64 server with ppc64 clients. Lustre versions 2.3.63-3chaos and 2.3.63-4chaos.
Seeing some vague similarity with LU-2895, we applited the patch from that issue with no improvement. But this assertion is in a different function so not necessarily surprising.
We went ahead with the proposed workaround for one affected filesystem (lscratchv, used by vulcan). We were able to bring it up under Lustre 2.3.63 without hitting this bug.
We will do the same for the legacy Sequoia filesystem (lscratch1) tomorrow. Sequoia is already mounting a new filesystem formatted using Lustre 2.3.63, but we want to mount the old one read-only to allow data migration. I'll report back on how it goes tomorrow.