Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
None
-
SLES11SP2 patchless client exporting the file system over NFS
-
3
-
8767
Description
As noted in LU-3483 and LU-3484, we're attempting to export Lustre 2.4 via NFS on SLES11SP2. This is the third of three issues we've found while doing so.
When attempting to do a mkdir (in the root of the exported file system, from the nfs client, with the nfs server a patchless SLES11SP2 2.4 client) and then an ls -la of this new directory, we got the following:
—
2013-06-10T09:52:52.210323-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) ASSERTION( de->d_op == &ll_d_ops ) failed:
2013-06-10T09:52:52.235599-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) LBUG
2013-06-10T09:52:52.235681-05:00 c0-0c0s7n1 Pid: 17310, comm: nfsd
2013-06-10T09:52:52.235765-05:00 c0-0c0s7n1 Call Trace:
2013-06-10T09:52:52.235969-05:00 c0-0c0s7n1 [<ffffffff81005da9>] try_stack_unwind+0x169/0x1b0
2013-06-10T09:52:52.260723-05:00 c0-0c0s7n1 [<ffffffff81004849>] dump_trace+0x89/0x450
2013-06-10T09:52:52.261059-05:00 c0-0c0s7n1 [<ffffffffa07548d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2013-06-10T09:52:52.261291-05:00 c0-0c0s7n1 [<ffffffffa0754e37>] lbug_with_loc+0x47/0xc0 [libcfs]
2013-06-10T09:52:52.286133-05:00 c0-0c0s7n1 [<ffffffffa0c8e7ac>] ll_dops_init+0x3cc/0x560 [lustre]
2013-06-10T09:52:52.286416-05:00 c0-0c0s7n1 [<ffffffffa0ccd2af>] ll_iget_for_nfs+0x2ff/0x390 [lustre]
2013-06-10T09:52:52.311470-05:00 c0-0c0s7n1 [<ffffffffa0ccdae0>] ll_get_parent+0x410/0x830 [lustre]
2013-06-10T09:52:52.311692-05:00 c0-0c0s7n1 [<ffffffff81253ce0>] reconnect_path+0x140/0x2d0
2013-06-10T09:52:52.311799-05:00 c0-0c0s7n1 [<ffffffff81254036>] exportfs_decode_fh+0xa6/0x280
2013-06-10T09:52:52.311913-05:00 c0-0c0s7n1 [<ffffffff81257c33>] fh_verify+0x353/0x6b0
2013-06-10T09:52:52.311958-05:00 c0-0c0s7n1 [<ffffffff812589f9>] nfsd_access+0x39/0x130
2013-06-10T09:52:52.336898-05:00 c0-0c0s7n1 [<ffffffff81261e3f>] nfsd3_proc_access+0x7f/0xe0
2013-06-10T09:52:52.337073-05:00 c0-0c0s7n1 [<ffffffff812545db>] nfsd_dispatch+0xbb/0x260
2013-06-10T09:52:52.362097-05:00 c0-0c0s7n1 [<ffffffff81491a8b>] svc_process+0x4ab/0x7a0
2013-06-10T09:52:52.362253-05:00 c0-0c0s7n1 [<ffffffff81254d75>] nfsd+0xd5/0x150
2013-06-10T09:52:52.362356-05:00 c0-0c0s7n1 [<ffffffff81068e0e>] kthread+0x9e/0xb0
2013-06-10T09:52:52.362547-05:00 c0-0c0s7n1 [<ffffffff814cfed4>] kernel_thread_helper+0x4/0x10
—
Here's the code in ll_dops_init:
—
int ll_dops_init(struct dentry *de, int block, int init_sa)
{
struct ll_dentry_data *lld = ll_d2d(de);
int rc = 0;
if (lld == NULL && block != 0)
{ rc = ll_set_dd(de); if (rc) return rc; lld = ll_d2d(de); } if (lld != NULL && init_sa != 0)
lld->lld_sa_generation = 0;
#ifdef HAVE_DCACHE_LOCK
de->d_op = &ll_d_ops;
#else
/* kernel >= 2.6.38 d_op is set in d_alloc() */
LASSERT(de->d_op == &ll_d_ops);
#endif
return rc;
—
I've investigated the crash dump and found that the d_op pointer is set to ll_d_root_ops, rather than ll_d_ops.
So I checked the dentry in question, and it IS the root dentry, which means it's correct that the dentry operations would be ll_d_root_ops.
d_obtain_alias (replacement for d_alloc) only sets ll_d_ops as described in the comment above when it is creating an anonymous dentry (done when it can't find any aliases for the inode). Presumably, the root dentry would already have an alias, which is why it's not getting set.
Prior to 2.6.38, d_op is set directly here to ll_d_ops.
That suggests a few possible issues, with varying fixes:
1) The assertion is wrong and it's OK for the dentry operations to be ll_d_root_ops in this case.
2) The root dentry should never make it here, something else is wrong. (What?)
3) It's not OK for the dentry operations to be ll_d_root_ops and we need to set them to ll_d_ops here. (But if so, why have ll_d_root_ops? This seems incorrect.)
Attachments
Issue Links
- is related to
-
LU-4400 Another LBUG with NFS reexport mainline 3.12 client
- Resolved