Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3139

osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.4.0
    • 3
    • 7623

    Description

      When starting lustre on Sequoia's MDS/MGS, it is hitting the following assertion:

      2013-04-09 16:46:16 Lustre: lsv-MDT0000: Will be in recovery for at least 5:00, or until 2 clients reconnect.
      2013-04-09 16:46:19 Lustre: lsv-MDT0000: Recovery over after 0:03, of 2 clients 2 recovered and 0 were evicted.
      2013-04-09 16:46:58 LustreError: 11-0: lsv-OST000c-osc-MDT0000: Communicating with 172.20.20.12@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:47:38 LustreError: 11-0: lsv-OST000b-osc-MDT0000: Communicating with 172.20.20.11@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:47:38 LustreError: Skipped 9 previous similar messages
      2013-04-09 16:48:03 LustreError: 11-0: lsv-OST0007-osc-MDT0000: Communicating with 172.20.20.7@o2ib500, operation ost_connect failed with -16.
      2013-04-09 16:48:03 LustreError: Skipped 9 previous similar messages
      2013-04-09 16:48:24 Lustre: lsv-OST0001-osc-MDT0000: Connection restored to lsv-OST0001 (at 172.20.20.1@o2ib500)
      2013-04-09 16:48:24 Lustre: lsv-OST0003-osc-MDT0000: Connection restored to lsv-OST0003 (at 172.20.20.3@o2ib500)
      2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100090000:0x4c00:0x0] pre used fid [0x100090000:0x16bec0:0x0]
      2013-04-09 16:49:44 LustreError: 18017:0:(osp_precreate.c:496:osp_precreate_send()) LBUG
      

      This is an x86_64 server with ppc64 clients. Lustre versions 2.3.63-3chaos and 2.3.63-4chaos.

      Seeing some vague similarity with LU-2895, we applited the patch from that issue with no improvement. But this assertion is in a different function so not necessarily surprising.

      Attachments

        Activity

          [LU-3139] osp_precreate_send()) ASSERTION( lu_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed

          Actually this is more likely LU-4653

          simmonsja James A Simmons added a comment - Actually this is more likely LU-4653

          I just hit this same bug on 2.5.1 branch. This happened on a newly formated file system.

          simmonsja James A Simmons added a comment - I just hit this same bug on 2.5.1 branch. This happened on a newly formated file system.

          The sequoia system is fixed, and no plan to add compatibility code for 2.3.58 <-> 2.3.63.

          niu Niu Yawei (Inactive) added a comment - The sequoia system is fixed, and no plan to add compatibility code for 2.3.58 <-> 2.3.63.

          The update of the sequoia filesystem was successful. If no work is planned for adding compatibility code, I think we can close this issue.

          nedbass Ned Bass (Inactive) added a comment - The update of the sequoia filesystem was successful. If no work is planned for adding compatibility code, I think we can close this issue.
          pjones Peter Jones added a comment -

          Thanks for the update Ned. I have dropped the priority slightly to reflect that this is still an important support issue but is not a general blocker for the release itself.

          pjones Peter Jones added a comment - Thanks for the update Ned. I have dropped the priority slightly to reflect that this is still an important support issue but is not a general blocker for the release itself.

          We went ahead with the proposed workaround for one affected filesystem (lscratchv, used by vulcan). We were able to bring it up under Lustre 2.3.63 without hitting this bug.

          We will do the same for the legacy Sequoia filesystem (lscratch1) tomorrow. Sequoia is already mounting a new filesystem formatted using Lustre 2.3.63, but we want to mount the old one read-only to allow data migration. I'll report back on how it goes tomorrow.

          nedbass Ned Bass (Inactive) added a comment - We went ahead with the proposed workaround for one affected filesystem (lscratchv, used by vulcan). We were able to bring it up under Lustre 2.3.63 without hitting this bug. We will do the same for the legacy Sequoia filesystem (lscratch1) tomorrow. Sequoia is already mounting a new filesystem formatted using Lustre 2.3.63, but we want to mount the old one read-only to allow data migration. I'll report back on how it goes tomorrow.

          Ned, there was a change with http://review.whamcloud.com/5820 (LU-2684) that affects the MDS FID storage in the LOV EA. This shouldn't affect normal Lustre operation, but there is a bit of churn in that code right now (e.g. LU-3152, LU-2888) that may affect upgraded filesystems and it would probably be better to wait until that issue is resolved.

          adilger Andreas Dilger added a comment - Ned, there was a change with http://review.whamcloud.com/5820 ( LU-2684 ) that affects the MDS FID storage in the LOV EA. This shouldn't affect normal Lustre operation, but there is a bit of churn in that code right now (e.g. LU-3152 , LU-2888 ) that may affect upgraded filesystems and it would probably be better to wait until that issue is resolved.

          Niu, Alex, Di, can you think of any other on-disk format changes that may bite us after this one? I don't want to get into a state where we can't mount the filesystem under any version of Lustre.

          nedbass Ned Bass (Inactive) added a comment - Niu, Alex, Di, can you think of any other on-disk format changes that may bite us after this one? I don't want to get into a state where we can't mount the filesystem under any version of Lustre.

          Okay, we'll schedule a time to try out this fix. It will probably be sometime next week.

          nedbass Ned Bass (Inactive) added a comment - Okay, we'll schedule a time to try out this fix. It will probably be sometime next week.
          di.wang Di Wang added a comment -

          Yes, LAST_ID should be used.

          di.wang Di Wang added a comment - Yes, LAST_ID should be used.

          People

            niu Niu Yawei (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: