[LU-4957] :osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0] Created: 25/Apr/14  Updated: 11/Jun/14  Resolved: 11/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne

Attachments: File lustre-log.1401303426.3299    
Issue Links:
Related
is related to LU-4653 Hit LBUG ASSERTION( fid_seq(fid1) == ... Resolved
Severity: 3
Rank (Obsolete): 13716

 Description   

Hit this during rolling upgrade. Please see LU-4653 for detail steps.

fat-amd-1.lab.whamcloud.com login: root
Password: 
Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted.
LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]
LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG
Pid: 2166, comm: osp-pre-0

Call Trace:
 [<ffffffffa0209895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0209e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0b119d7>] osp_precreate_send+0x1a47/0x1b00 [osp]
 [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
 [<ffffffffa0b11f79>] osp_precreate_thread+0x4e9/0xc50 [osp]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
Last login: Wed  [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 23 17:31:38  [<ffffffff8100c20a>] child_rip+0xa/0x20
on ttyS0
 [<ffffffff8109ae50>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 2166, comm: osp-pre-0 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1
Call Trace:
 [<ffffffff81527983>] ? panic+0xa7/0x16f
 [<ffffffffa0209eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0b119d7>] ? osp_precreate_send+0x1a47/0x1b00 [osp]
 [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
 [<ffffffffa0b11f79>] ? osp_precreate_thread+0x4e9/0xc50 [osp]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
 [<ffffffff8109aee6>] ? kthread+0x96/0xa0
 [<ffffffff8100c20a>] ? child_rip+0xa/0x20
 [<ffffffff8109ae50>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu


 Comments   
Comment by Jodi Levi (Inactive) [ 25/Apr/14 ]

Alex,
Could you please look into this one?
Thank you!

Comment by Sarah Liu [ 28/May/14 ]

debug dump log from MDS

Comment by Jodi Levi (Inactive) [ 02/Jun/14 ]

Di,
Could you please take a look at this?
Thank you!

Comment by Di Wang [ 03/Jun/14 ]

Ok, I will check.

Comment by Alex Zhuravlev [ 03/Jun/14 ]

unfortunately the log isn't full (echo -1 >/proc/sys/lnet/debug was missing?)

Comment by Di Wang [ 03/Jun/14 ]

It seems in 2.6, OFD returned IDIF to MDT during precreate, which the old MDT can not handle. Here is the fix
http://review.whamcloud.com/10580

Comment by Andreas Dilger [ 06/Jun/14 ]

There is still something strange here. Sequence 0x100000001 is not the IDIF sequence for OST1 (which would be 0x100010000 with the low 16 bits encoding the OST index). This LASSERT() was not hit on the OSP#1 precreate thread, but rather process "osp-pre-0" which is OSP#0.

In any case, I don't think the OSP should LASSERT() on something it got over the network, just return an error and refuse to precreate objects on this OST.

Comment by Andreas Dilger [ 06/Jun/14 ]

In fact, it looks like there is a separate bug in osp_fid_diff() that is being hit here. Because both of these FID sequences are for OST0000 then it continues on to subtract (0x100000000 - 0x000001e8) but return it as an int, so it returns -0x1e8 and this is what triggers the LASSERT(). Otherwise, osp_fid_diff() would itself have triggered the LASSERT:

                LASSERTF(ost_idx1 == 0 || ost_idx2 == 0 || ost_idx1 == ost_idx2,
                         "fid1: "DFID", fid2: "DFID"\n", PFID(fid1),
                         PFID(fid2));

                return fid_idif_id(fid1->f_seq, fid1->f_oid, 0) -
                       fid_idif_id(fid2->f_seq, fid2->f_oid, 0);
        }

        LASSERTF(fid_seq(fid1) == fid_seq(fid2), "fid1:"DFID
                 ", fid2:"DFID"\n", PFID(fid1), PFID(fid2));

Sadly, LASSERT() is not a robust form of error handling here either, and this should also be fixed. What happens in the precreate code if the number of OST objects exceeds 4B and the sequence increases as it does here? Is there some code path on the OSTs that bumps the OST LAST_ID a lot that may cause a large difference in returned ostids?

Comment by Di Wang [ 06/Jun/14 ]

Actually for this bug, it is because OFD(2.6) returns a IDIF directly to OSP(2.5), but old OSP can not handle this well, because in 2.5, during precreate, OSP and OFD will convert IDIF to old OSTID (with MDT0 seq). So when old OSP receives the IDIF from new OFD, it regarded it as a OSTID, so convert it to a wrong FID. (2.5 might have bug as well) , that is why we see a strange reply FID "[0x100000001:0x0:0x0]" here, which trigger the LBUG. Hmm, I probably need fix 2.5 and 2.4 as well.

yes, if OST object exceeds 4B(it will be more for IDIF 45 bits), OST will request a new sequence. Hmm, I probably need add some test case to test it.

Hmm, it is OSP that will tell how many objects OFD should precreate(max 10k) in one RPC. Normally OSP will only request precreate when its pool is nearly empty, and OST precreate will not exceed 10k per RPC, so the diff will at most 10k. Even if OSP and OST are out of sync because of failover, I do not see how the diff will bump up more than 10k, but need think a bit.

Comment by Jodi Levi (Inactive) [ 11/Jun/14 ]

Patch landed to Master. Please reopen ticket if more work is needed.

Generated at Sat Feb 10 01:47:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.