Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4957

:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 3
    • 13716

    Description

      Hit this during rolling upgrade. Please see LU-4653 for detail steps.

      fat-amd-1.lab.whamcloud.com login: root
      Password: 
      Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted.
      LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]
      LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG
      Pid: 2166, comm: osp-pre-0
      
      Call Trace:
       [<ffffffffa0209895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0209e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0b119d7>] osp_precreate_send+0x1a47/0x1b00 [osp]
       [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
       [<ffffffffa0b11f79>] osp_precreate_thread+0x4e9/0xc50 [osp]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
       [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
      Last login: Wed  [<ffffffff8109aee6>] kthread+0x96/0xa0
      Apr 23 17:31:38  [<ffffffff8100c20a>] child_rip+0xa/0x20
      on ttyS0
       [<ffffffff8109ae50>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      Kernel panic - not syncing: LBUG
      Pid: 2166, comm: osp-pre-0 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1
      Call Trace:
       [<ffffffff81527983>] ? panic+0xa7/0x16f
       [<ffffffffa0209eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa0b119d7>] ? osp_precreate_send+0x1a47/0x1b00 [osp]
       [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
       [<ffffffffa0b11f79>] ? osp_precreate_thread+0x4e9/0xc50 [osp]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
       [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
       [<ffffffff8109aee6>] ? kthread+0x96/0xa0
       [<ffffffff8100c20a>] ? child_rip+0xa/0x20
       [<ffffffff8109ae50>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Initializing cgroup subsys cpuset
      Initializing cgroup subsys cpu
      

      Attachments

        Issue Links

          Activity

            [LU-4957] :osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]

            Patch landed to Master. Please reopen ticket if more work is needed.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed.
            di.wang Di Wang added a comment -

            Actually for this bug, it is because OFD(2.6) returns a IDIF directly to OSP(2.5), but old OSP can not handle this well, because in 2.5, during precreate, OSP and OFD will convert IDIF to old OSTID (with MDT0 seq). So when old OSP receives the IDIF from new OFD, it regarded it as a OSTID, so convert it to a wrong FID. (2.5 might have bug as well) , that is why we see a strange reply FID "[0x100000001:0x0:0x0]" here, which trigger the LBUG. Hmm, I probably need fix 2.5 and 2.4 as well.

            yes, if OST object exceeds 4B(it will be more for IDIF 45 bits), OST will request a new sequence. Hmm, I probably need add some test case to test it.

            Hmm, it is OSP that will tell how many objects OFD should precreate(max 10k) in one RPC. Normally OSP will only request precreate when its pool is nearly empty, and OST precreate will not exceed 10k per RPC, so the diff will at most 10k. Even if OSP and OST are out of sync because of failover, I do not see how the diff will bump up more than 10k, but need think a bit.

            di.wang Di Wang added a comment - Actually for this bug, it is because OFD(2.6) returns a IDIF directly to OSP(2.5), but old OSP can not handle this well, because in 2.5, during precreate, OSP and OFD will convert IDIF to old OSTID (with MDT0 seq). So when old OSP receives the IDIF from new OFD, it regarded it as a OSTID, so convert it to a wrong FID. (2.5 might have bug as well) , that is why we see a strange reply FID " [0x100000001:0x0:0x0] " here, which trigger the LBUG. Hmm, I probably need fix 2.5 and 2.4 as well. yes, if OST object exceeds 4B(it will be more for IDIF 45 bits), OST will request a new sequence. Hmm, I probably need add some test case to test it. Hmm, it is OSP that will tell how many objects OFD should precreate(max 10k) in one RPC. Normally OSP will only request precreate when its pool is nearly empty, and OST precreate will not exceed 10k per RPC, so the diff will at most 10k. Even if OSP and OST are out of sync because of failover, I do not see how the diff will bump up more than 10k, but need think a bit.
            adilger Andreas Dilger added a comment - - edited

            In fact, it looks like there is a separate bug in osp_fid_diff() that is being hit here. Because both of these FID sequences are for OST0000 then it continues on to subtract (0x100000000 - 0x000001e8) but return it as an int, so it returns -0x1e8 and this is what triggers the LASSERT(). Otherwise, osp_fid_diff() would itself have triggered the LASSERT:

                            LASSERTF(ost_idx1 == 0 || ost_idx2 == 0 || ost_idx1 == ost_idx2,
                                     "fid1: "DFID", fid2: "DFID"\n", PFID(fid1),
                                     PFID(fid2));
            
                            return fid_idif_id(fid1->f_seq, fid1->f_oid, 0) -
                                   fid_idif_id(fid2->f_seq, fid2->f_oid, 0);
                    }
            
                    LASSERTF(fid_seq(fid1) == fid_seq(fid2), "fid1:"DFID
                             ", fid2:"DFID"\n", PFID(fid1), PFID(fid2));
            

            Sadly, LASSERT() is not a robust form of error handling here either, and this should also be fixed. What happens in the precreate code if the number of OST objects exceeds 4B and the sequence increases as it does here? Is there some code path on the OSTs that bumps the OST LAST_ID a lot that may cause a large difference in returned ostids?

            adilger Andreas Dilger added a comment - - edited In fact, it looks like there is a separate bug in osp_fid_diff() that is being hit here. Because both of these FID sequences are for OST0000 then it continues on to subtract (0x100000000 - 0x000001e8) but return it as an int, so it returns -0x1e8 and this is what triggers the LASSERT(). Otherwise, osp_fid_diff() would itself have triggered the LASSERT: LASSERTF(ost_idx1 == 0 || ost_idx2 == 0 || ost_idx1 == ost_idx2, "fid1: " DFID ", fid2: " DFID "\n" , PFID(fid1), PFID(fid2)); return fid_idif_id(fid1->f_seq, fid1->f_oid, 0) - fid_idif_id(fid2->f_seq, fid2->f_oid, 0); } LASSERTF(fid_seq(fid1) == fid_seq(fid2), "fid1:" DFID ", fid2:" DFID "\n" , PFID(fid1), PFID(fid2)); Sadly, LASSERT() is not a robust form of error handling here either, and this should also be fixed. What happens in the precreate code if the number of OST objects exceeds 4B and the sequence increases as it does here? Is there some code path on the OSTs that bumps the OST LAST_ID a lot that may cause a large difference in returned ostids?

            There is still something strange here. Sequence 0x100000001 is not the IDIF sequence for OST1 (which would be 0x100010000 with the low 16 bits encoding the OST index). This LASSERT() was not hit on the OSP#1 precreate thread, but rather process "osp-pre-0" which is OSP#0.

            In any case, I don't think the OSP should LASSERT() on something it got over the network, just return an error and refuse to precreate objects on this OST.

            adilger Andreas Dilger added a comment - There is still something strange here. Sequence 0x100000001 is not the IDIF sequence for OST1 (which would be 0x100010000 with the low 16 bits encoding the OST index). This LASSERT() was not hit on the OSP#1 precreate thread, but rather process "osp-pre-0" which is OSP#0. In any case, I don't think the OSP should LASSERT() on something it got over the network, just return an error and refuse to precreate objects on this OST.
            di.wang Di Wang added a comment -

            It seems in 2.6, OFD returned IDIF to MDT during precreate, which the old MDT can not handle. Here is the fix
            http://review.whamcloud.com/10580

            di.wang Di Wang added a comment - It seems in 2.6, OFD returned IDIF to MDT during precreate, which the old MDT can not handle. Here is the fix http://review.whamcloud.com/10580

            unfortunately the log isn't full (echo -1 >/proc/sys/lnet/debug was missing?)

            bzzz Alex Zhuravlev added a comment - unfortunately the log isn't full (echo -1 >/proc/sys/lnet/debug was missing?)
            di.wang Di Wang added a comment -

            Ok, I will check.

            di.wang Di Wang added a comment - Ok, I will check.

            Di,
            Could you please take a look at this?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Di, Could you please take a look at this? Thank you!
            sarah Sarah Liu added a comment -

            debug dump log from MDS

            sarah Sarah Liu added a comment - debug dump log from MDS

            Alex,
            Could you please look into this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Alex, Could you please look into this one? Thank you!

            People

              di.wang Di Wang
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: