Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3139

osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.4.0
    • 3
    • 7623

    Description

      When starting lustre on Sequoia's MDS/MGS, it is hitting the following assertion:

      2013-04-09 16:46:16 Lustre: lsv-MDT0000: Will be in recovery for at least 5:00, or until 2 clients reconnect.
      2013-04-09 16:46:19 Lustre: lsv-MDT0000: Recovery over after 0:03, of 2 clients 2 recovered and 0 were evicted.
      2013-04-09 16:46:58 LustreError: 11-0: lsv-OST000c-osc-MDT0000: Communicating with 172.20.20.12@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:47:38 LustreError: 11-0: lsv-OST000b-osc-MDT0000: Communicating with 172.20.20.11@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:47:38 LustreError: Skipped 9 previous similar messages
      2013-04-09 16:48:03 LustreError: 11-0: lsv-OST0007-osc-MDT0000: Communicating with 172.20.20.7@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:48:03 LustreError: Skipped 9 previous similar messages
      2013-04-09 16:48:24 Lustre: lsv-OST0001-osc-MDT0000: Connection restored to lsv-OST0001 (at 172.20.20.1@o2ib500)
      2013-04-09 16:48:24 Lustre: lsv-OST0003-osc-MDT0000: Connection restored to lsv-OST0003 (at 172.20.20.3@o2ib500)
      2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100090000:0x4c00:0x0] pre used fid [0x100090000:0x16bec0:0x0]
      2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) LBUG
      

      This is an x86_64 server with ppc64 clients. Lustre versions 2.3.63-3chaos and 2.3.63-4chaos.

      Seeing some vague similarity with LU-2895, we applited the patch from that issue with no improvement. But this assertion is in a different function so not necessarily surprising.

      Attachments

        Activity

          [LU-3139] osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed
          pjones Peter Jones added a comment -

          Thanks for the update Ned. I have dropped the priority slightly to reflect that this is still an important support issue but is not a general blocker for the release itself.

          pjones Peter Jones added a comment - Thanks for the update Ned. I have dropped the priority slightly to reflect that this is still an important support issue but is not a general blocker for the release itself.

          We went ahead with the proposed workaround for one affected filesystem (lscratchv, used by vulcan). We were able to bring it up under Lustre 2.3.63 without hitting this bug.

          We will do the same for the legacy Sequoia filesystem (lscratch1) tomorrow. Sequoia is already mounting a new filesystem formatted using Lustre 2.3.63, but we want to mount the old one read-only to allow data migration. I'll report back on how it goes tomorrow.

          nedbass Ned Bass (Inactive) added a comment - We went ahead with the proposed workaround for one affected filesystem (lscratchv, used by vulcan). We were able to bring it up under Lustre 2.3.63 without hitting this bug. We will do the same for the legacy Sequoia filesystem (lscratch1) tomorrow. Sequoia is already mounting a new filesystem formatted using Lustre 2.3.63, but we want to mount the old one read-only to allow data migration. I'll report back on how it goes tomorrow.

          Ned, there was a change with http://review.whamcloud.com/5820 (LU-2684) that affects the MDS FID storage in the LOV EA. This shouldn't affect normal Lustre operation, but there is a bit of churn in that code right now (e.g. LU-3152, LU-2888) that may affect upgraded filesystems and it would probably be better to wait until that issue is resolved.

          adilger Andreas Dilger added a comment - Ned, there was a change with http://review.whamcloud.com/5820 ( LU-2684 ) that affects the MDS FID storage in the LOV EA. This shouldn't affect normal Lustre operation, but there is a bit of churn in that code right now (e.g. LU-3152 , LU-2888 ) that may affect upgraded filesystems and it would probably be better to wait until that issue is resolved.

          Niu, Alex, Di, can you think of any other on-disk format changes that may bite us after this one? I don't want to get into a state where we can't mount the filesystem under any version of Lustre.

          nedbass Ned Bass (Inactive) added a comment - Niu, Alex, Di, can you think of any other on-disk format changes that may bite us after this one? I don't want to get into a state where we can't mount the filesystem under any version of Lustre.

          Okay, we'll schedule a time to try out this fix. It will probably be sometime next week.

          nedbass Ned Bass (Inactive) added a comment - Okay, we'll schedule a time to try out this fix. It will probably be sometime next week.
          di.wang Di Wang added a comment -

          Yes, LAST_ID should be used.

          di.wang Di Wang added a comment - Yes, LAST_ID should be used.

          Alex, the oi.* directories are mostly empty through the ZPL. /oi.1/0x200000001:0x14:0x0 doesn't exist , but /O/0/LAST_ID and /O/0/d0/0 are there with contents as you describe. So we should use the LAST_ID file instead, correct?

          nedbass Ned Bass (Inactive) added a comment - Alex, the oi.* directories are mostly empty through the ZPL. /oi.1/0x200000001:0x14:0x0 doesn't exist , but /O/0/LAST_ID and /O/0/d0/0 are there with contents as you describe. So we should use the LAST_ID file instead, correct?

          This is good news then. I'd suggest to: take a snapshot for safety, then ..

          mount with ZPL, check file /oi.1/0x200000001:0x14:0x0 and check its content, it should be 8byte length and contain a number close to 0x16bec0 (last id used on MDS).

          the new file should be in /O/0/d0/0 - it should be 8byte too and the number much less than 0x16bec0, close to the first number you saw last in
          a message like: Apr 9 16:50:12 vesta5 kernel: Lustre: lsv-OST0005: Slow creates, 2048/1482320 objects created at a rate of 40/s

          I think it should be enough to write the content from the old file (/oi.1/0x200000001:0x14:0x0) into the new one (/O/0/d0/0)

          Niu, Di, could you please confirm this suggestion is sane?

          bzzz Alex Zhuravlev added a comment - This is good news then. I'd suggest to: take a snapshot for safety, then .. mount with ZPL, check file /oi.1/0x200000001:0x14:0x0 and check its content, it should be 8byte length and contain a number close to 0x16bec0 (last id used on MDS). the new file should be in /O/0/d0/0 - it should be 8byte too and the number much less than 0x16bec0, close to the first number you saw last in a message like: Apr 9 16:50:12 vesta5 kernel: Lustre: lsv-OST0005: Slow creates, 2048/1482320 objects created at a rate of 40/s I think it should be enough to write the content from the old file (/oi.1/0x200000001:0x14:0x0) into the new one (/O/0/d0/0) Niu, Di, could you please confirm this suggestion is sane?

          Alex, BTW this filesystem will not be long-lived due to the risk of these on-disk incompatibilities. We will provide a newly-formatted filesystem for users that will coexist in the same zpools as this legacy one. We just need to be able to mount the legacy one under 2.3.63+ long enough for users to migrate their data.

          nedbass Ned Bass (Inactive) added a comment - Alex, BTW this filesystem will not be long-lived due to the risk of these on-disk incompatibilities. We will provide a newly-formatted filesystem for users that will coexist in the same zpools as this legacy one. We just need to be able to mount the legacy one under 2.3.63+ long enough for users to migrate their data.

          We can mount it with ZPL. There is some strange behavior like . or .. missing or showing up twice, or incorrect hard link counts. But we can read/write/open/close local objects like LAST_ID, last_rcvd, lov_objid, etc.

          nedbass Ned Bass (Inactive) added a comment - We can mount it with ZPL. There is some strange behavior like . or .. missing or showing up twice, or incorrect hard link counts. But we can read/write/open/close local objects like LAST_ID, last_rcvd, lov_objid, etc.

          AFAIU, yes, it usually can be mounted with ZPL. but.. this may not work for the old filesystem as compatibility with ZPL was implemented just before the landing in September, iirc.

          bzzz Alex Zhuravlev added a comment - AFAIU, yes, it usually can be mounted with ZPL. but.. this may not work for the old filesystem as compatibility with ZPL was implemented just before the landing in September, iirc.

          People

            niu Niu Yawei (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: