[LU-3233] tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed: Created: 26/Apr/13 Updated: 12/Sep/13 Resolved: 12/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mdt | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 7934 | ||||||||||||
| Description |
|
I see this running racer on a single node MDSCOUNT=2 llmount.sh filesystem. (I upgraded the LASSERT() to an LASSERTF().) LustreError: 10307:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 4295820775, new: 4295609707 replay: 0. see LU-617.
LustreError: 10277:0:(tgt_lastrcvd.c:412:tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed: exp_obd ffff88014fb182f8 != lut_obd ffff88016c536178
Breakpoint 2, lbug_with_loc (msgdata=0xffffffffa12c9360) at /root/lustre-release/libcfs/libcfs/linux/linux-debug.c:167
167 libcfs_debug_msg(msgdata, "LBUG\n");
(gdb) bt
#0 lbug_with_loc (msgdata=0xffffffffa12c9360)
at /root/lustre-release/libcfs/libcfs/linux/linux-debug.c:167
#1 0xffffffffa12613e9 in tgt_cb_last_committed (env=<value optimized out>,
th=<value optimized out>, cb=0xffff88017377cb40, err=<value optimized out>)
at /root/lustre-release/lustre/ptlrpc/../../lustre/target/tgt_lastrcvd.c:410
#2 0xffffffffa0874a24 in osd_trans_commit_cb (sb=<value optimized out>,
jcb=<value optimized out>, error=0)
at /root/lustre-release/lustre/osd-ldiskfs/osd_handler.c:646
#3 0xffffffffa082610a in ldiskfs_journal_commit_callback (
journal=<value optimized out>, txn=0xffff880177622a80)
at /root/lustre-release/ldiskfs/ldiskfs/super.c:329
#4 0xffffffffa007584f in jbd2_journal_commit_transaction (journal=0xffff880150255800)
at fs/jbd2/commit.c:1049
#5 0xffffffffa007b698 in kjournald2 (arg=0xffff880150255800) at fs/jbd2/journal.c:163
#6 0xffffffff81090626 in kthread (_create=0xffff880160ab15c8) at kernel/kthread.c:78
#7 0xffffffff8100c0ca in ?? () at arch/x86/kernel/entry_64.S:1211
#8 0x0000000000000000 in ?? ()
(gdb) p *((struct obd_device *) 0xffff88014fb182f8)
$2 = {
obd_type = 0xffff88019705ca40,
obd_magic = 2874988271,
obd_name = "lustre-MDT0001", '\000' <repeats 113 times>,
obd_uuid = {
uuid = "lustre-MDT0001_UUID", '\000' <repeats 20 times>
},
...
(gdb) p *((struct obd_device *) 0xffff88016c536178)
$3 = {
obd_type = 0xffff88019705ca40,
obd_magic = 2874988271,
obd_name = "lustre-MDT0000", '\000' <repeats 113 times>,
obd_uuid = {
uuid = "lustre-MDT0000_UUID", '\000' <repeats 20 times>
},
...
|
| Comments |
| Comment by Mikhail Pershin [ 26/Apr/13 ] |
|
so callback from second MDT comes to the first one, how intriguing. Jonh, I suspect that above message from mdt_last_rcvd_update() might be also because of that, can you add dump_stack() there to see call trace when that happens? |
| Comment by John Hammond [ 26/Apr/13 ] |
|
I tried something similar running with the patch from @@ -435,6 +437,9 @@ int tgt_last_commit_cb_add(struct thandle *th, struct lu_target *tgt,
struct dt_txn_commit_cb *dcb;
int rc;
+ LASSERTF(exp->exp_obd == tgt->lut_obd,
+ "exp_obd %p != lut_obd %p\n", exp->exp_obd, tgt->lut_obd);
+
OBD_ALLOC_PTR(ccb);
if (ccb == NULL)
return -ENOMEM;
I got the following: LustreError: 7895:0:(tgt_lastrcvd.c:441:tgt_last_commit_cb_add()) ASSERTION( exp->exp_obd == tgt->lut_obd ) failed: exp_obd ffff88016a8842f8 != lut_obd ffff88016ec42178
(gdb) bt
#0 lbug_with_loc (msgdata=0xffffffffa091b300)
at /root/lustre-release/libcfs/libcfs/linux/linux-debug.c:167
#1 0xffffffffa08b4f95 in tgt_last_commit_cb_add (th=0xffff880151342e80,
tgt=0xffff88016ec28128, exp=0xffff880162bc5000, transno=4294978364)
at /root/lustre-release/lustre/ptlrpc/../../lustre/target/tgt_lastrcvd.c:440
#2 0xffffffffa0ccff92 in mdt_txn_stop_cb (env=0xffff880170ce5480, txn=0xffff880151342e80,
cookie=0xffff88016ec28000) at /root/lustre-release/lustre/mdt/mdt_recovery.c:568
#3 0xffffffffa06da313 in dt_txn_hook_stop (env=0xffff880170ce5480, txn=0xffff880151342e80)
at /root/lustre-release/lustre/obdclass/dt_object.c:114
#4 0xffffffffa0c22d15 in osd_trans_stop (env=0xffff880170ce5480, th=0xffff880151342e80)
at /root/lustre-release/lustre/osd-ldiskfs/osd_handler.c:835
#5 0xffffffffa0d88e34 in lod_trans_stop ()
#6 0xffffffffa07db07d in mdd_trans_stop (env=<value optimized out>,
mdd=<value optimized out>, result=<value optimized out>, handle=<value optimized out>)
at /root/lustre-release/lustre/mdd/mdd_trans.c:67
#7 0xffffffffa07c1ad2 in mdd_attr_set (env=0xffff880170ce5480, obj=0xffff8801505ee3d0,
ma=<value optimized out>) at /root/lustre-release/lustre/mdd/mdd_object.c:884
#8 0xffffffffa0cd5f72 in mo_attr_set (info=0xffff88016fb80000, mfd=0xffff880152b608c0)
at /root/lustre-release/lustre/include/md_object.h:569
#9 mdt_mfd_close (info=0xffff88016fb80000, mfd=0xffff880152b608c0)
at /root/lustre-release/lustre/mdt/mdt_open.c:1769
#10 0xffffffffa0cd67d2 in mdt_close (info=0xffff88016fb80000)
at /root/lustre-release/lustre/mdt/mdt_open.c:1898
#11 0xffffffffa0cad118 in mdt_req_handle (req=0xffff8801518b0000,
supported=0xffffffffa0d11080) at /root/lustre-release/lustre/mdt/mdt_handler.c:2983
#12 mdt_handle0 (req=0xffff8801518b0000, supported=0xffffffffa0d11080)
at /root/lustre-release/lustre/mdt/mdt_handler.c:3335
#13 mdt_handle_common (req=0xffff8801518b0000, supported=0xffffffffa0d11080)
at /root/lustre-release/lustre/mdt/mdt_handler.c:3377
#14 0xffffffffa0ce94c5 in mds_readpage_handle (req=<value optimized out>)
at /root/lustre-release/lustre/mdt/mdt_mds.c:358
#15 0xffffffffa08755b8 in ptlrpc_server_handle_request (svcpt=0xffff88016fbfec00,
thread=<value optimized out>) at /root/lustre-release/lustre/ptlrpc/service.c:2016
#16 0xffffffffa087693e in ptlrpc_main (arg=0xffff88016f49a6c0)
at /root/lustre-release/lustre/ptlrpc/service.c:2522
#17 0xffffffff8100c0ca in ?? () at arch/x86/kernel/entry_64.S:1211
#18 0x0000000000000000 in ?? ()
10 mdt_close (info=0xffff88016fb80000)
(gdb) p o
$43 = (struct mdt_object *) 0xffff880154e60c10
(gdb) p/x o->mot_header.loh_fid
$47 = {
f_seq = 0x280000401,
f_oid = 0xccc,
f_ver = 0x0
}
(gdb) p o->mot_obj.mo_lu.lo_dev
$51 = (struct lu_device *) 0xffff88016ec28000 ### lustre-MDT0000
9 mdt_mfd_close (info=0xffff88016fb80000, mfd=0xffff880152b608c0)
8 mo_attr_set(...)
7 mdd_attr_set mdd_attr_set (env=0xffff880170ce5480, obj=0xffff8801505ee3d0, ma=<value optimized out>)
p obj->mo_lu.lo_dev->ld_obd
$8 = (struct obd_device *) 0xffff88016ec441b8 ### lustre-MDD0000
p *mdd_obj->mod_obj.mo_lu.lo_dev->ld_obd
$15 = (struct obd_device *) 0xffff88016ec441b8 ### lustre-MDD0000
p (struct mdd_device *)mdd_obj->mod_obj.mo_lu.lo_dev
$20 = (struct mdd_device *) 0xffff88016ec2bc00 ### lustre-MDD0000
6 mdd_trans_stop(env, mdd, result, handle)
5 lod_trans_stop(...)
4 osd_trans_stop(env, th)
3 dt_txn_hook_stop(env, txn)
2 mdt_txn_stop_cb (env=0xffff880170ce5480, txn=0xffff880151342e80, cookie=0xffff88016ec28000)
### mti == info from f10
p (struct mdt_device *) cookie
$23 = (struct mdt_device *) 0xffff88016ec28000 ### lustre-MDT0000
(gdb) p mdt
$28 = (struct mdt_device *) 0xffff88016ec28000 ### lustre-MDT0000
(gdb) p mdt->mdt_md_dev.md_lu_dev->ld_obd
$29 = (struct obd_device *) 0xffff88016ec42178 ### lustre-MDT0000
(gdb) p *mdt->mdt_md_dev.md_lu_dev->ld_obd ### lustre-MDT0000
(gdb) p mti->mti_mdt
$39 = (struct mdt_device *) 0xffff88016cd03000 ### lustre-MDT0001
(gdb) p mti->mti_mdt->mdt_md_dev.md_lu_dev->ld_obd
$40 = (struct obd_device *) 0xffff88016a8842f8 ### lustre-MDT0001
(gdb) p mti->mti_exp
$38 = (struct obd_export *) 0xffff880162bc5000
(gdb) p mti->mti_pill->rc_req
$42 = (struct ptlrpc_request *) 0xffff8801518b0000
(gdb) p mti->mti_pill->rc_req->rq_export
$37 = (struct obd_export *) 0xffff880162bc5000
(gdb) p mti->mti_pill->rc_req->rq_export->exp_obd
$35 = (struct obd_device *) 0xffff88016a8842f8 ### lustre-MDT0001
1 tgt_last_commit_cb_add (th=0xffff880151342e80, tgt=0xffff88016ec28128, exp=0xffff880162bc5000, transno=4294978364)
I have tuned racer to reproduce this more readily so please let me know if you to see more of the stack. |
| Comment by Mikhail Pershin [ 27/Apr/13 ] |
|
John, it is clear that mdt from cookie is not the same as one in mti->mti_mdt. I think that mti_mdt must be correct one, it is taken from request export, and I see that wrong mdt is taken from object which in turn was taken in mdt_close() as mfd_object. So I suspect we have found wrong mfd due to hash collision: To check that you can add extra check in mdt_handle2mfd() and check that mfd_object has the same device as info->mti_mdt. Such mfd must be ignored and not used. |
| Comment by John Hammond [ 29/Apr/13 ] |
|
From my read of the handles code, it doesn't seem likely that mdt_mfd_new() would issue the same cookie twice, as it's not really a hash value but a counter. |
| Comment by Mikhail Pershin [ 29/Apr/13 ] |
|
we need to trace down where this cookie came from, maybe this is client side issue and wrong cookie is sent to the server or something similar. |
| Comment by John Hammond [ 01/May/13 ] |
|
In mtd_reint_open() should we be doing lookup if the MDS_OPEN_BY_FID flag is set? We already have what should be the child FID in that case. Moreover if we trust that FID when MDS_OPEN_BY_FID is set then I can no longer reproduce this. But I cannot say I fully understand this code. |
| Comment by John Hammond [ 18/Jun/13 ] |
|
I think this is because ll_och_fill() may use the wrong FID under certain circumstances. Consider the following: ll_intent_file_open()
lmv_intent_lock()
mdc_intent_lock()
mdc_finish_intent_lock()
And assume that mdc_finish_intent_lock() returns -ESTALE because the FID in body does not match either of the FIDs in op_data. Then ll_intent_file_open() see -ESTALE and we have: ll_intent_file_open(file, ..., it)
ll_release_openhandle(dentry, it)
OBD_ALLOC(och)
ll_och_fill(md_exp, lli, it, och)
req = it->d.lustre.it_data
body = req_capsule_server_get(req...)
och->och_fh = body->handle
och->och_fid = lli->lli_fid
ll_close_inode_openhandle(md_exp, inode, och)
But ll_och_fill() should be using the FID from body for this close, not from the lli. Then lmv_set_open_replay_data() uses this bad FID to choose the wrong target, and so on. It seems like the fix is to have ll_och_fill() use the FID from body in general. Perhaps mdt_close() should also check that the FID sent matched the FID of the object in mfd. Now it totally trusts the cookie/mfd, and we only notice the mismatch when DNE is enabled and only because of the assertion in tgt_cb_last_committed(). |
| Comment by John Hammond [ 19/Jun/13 ] |
|
Please see http://review.whamcloud.com/6695. |
| Comment by Oleg Drokin [ 10/Jul/13 ] |
|
So, do I get this right, a data from client crashes server? |
| Comment by John Hammond [ 10/Jul/13 ] |
|
> So, do I get this right, a data from client crashes server? Yes, the och gets the wrong FID, so LMV routes the right handle to the wrong MDT (on the same MDS as the right MDT). The MDT totally trusts the handle (until we hit the assertion above). > Let's hold on on the client patch and fix the server first so that it does not crash! OK good. I'm concerned that once we don't trust handles then class_handle2object() and the synchronization (or lack of) around it needs a total rethink. Can someone suggest a less drastic solution? |
| Comment by Oleg Drokin [ 10/Jul/13 ] |
|
Right, there's normally an overtrust to those handles and underchecking their validness (mostly in locks), but granted most of the time it does not end up with such drastic crashes on server if you get the handle wrong. |
| Comment by John Hammond [ 10/Jul/13 ] |
|
Indeed. In any case I think I may have a reasonable fix for the MDT case without touching too much else. Please see http://review.whamcloud.com/6938. |
| Comment by John Hammond [ 19/Jul/13 ] |
|
Both patches landed to master. Dropping to minor as Jinshan and I had discussed whether the logic of using M_CHECK_STALE and having the server return the FID of the new file (along with an open handle) really made any sense here. It appeared that someone planned an optimization around returning the new FID but it never got implemented. I believe that open might be made more robust by having the server ignore any supplied name if MDS_OPEN_BY_FID is set, and perhaps by having the client not supply a name when it sets MDS_OPEN_BY_FID. More analysis is needed however. |
| Comment by John Hammond [ 11/Sep/13 ] |
|
See Niu's patches to remove M_CHECK_STALE and make open by FID more sane. |
| Comment by Peter Jones [ 12/Sep/13 ] |
|
Landed for 2.4.1 and 2.5.0 |