[LU-3486] LBUG when exporting Lustre 2.4 via NFS on SLES11SP2: ll_dops_init: ASSERTION( de->d_op == &ll_d_ops ) failed Created: 20/Jun/13  Updated: 10/Jun/14  Resolved: 06/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Patrick Farrell (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

SLES11SP2 patchless client exporting the file system over NFS


Issue Links:
Related
is related to LU-4400 Another LBUG with NFS reexport mainli... Resolved
Severity: 3
Rank (Obsolete): 8767

 Description   

As noted in LU-3483 and LU-3484, we're attempting to export Lustre 2.4 via NFS on SLES11SP2. This is the third of three issues we've found while doing so.

When attempting to do a mkdir (in the root of the exported file system, from the nfs client, with the nfs server a patchless SLES11SP2 2.4 client) and then an ls -la of this new directory, we got the following:

2013-06-10T09:52:52.210323-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) ASSERTION( de->d_op == &ll_d_ops ) failed:
2013-06-10T09:52:52.235599-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) LBUG
2013-06-10T09:52:52.235681-05:00 c0-0c0s7n1 Pid: 17310, comm: nfsd
2013-06-10T09:52:52.235765-05:00 c0-0c0s7n1 Call Trace:
2013-06-10T09:52:52.235969-05:00 c0-0c0s7n1 [<ffffffff81005da9>] try_stack_unwind+0x169/0x1b0
2013-06-10T09:52:52.260723-05:00 c0-0c0s7n1 [<ffffffff81004849>] dump_trace+0x89/0x450
2013-06-10T09:52:52.261059-05:00 c0-0c0s7n1 [<ffffffffa07548d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2013-06-10T09:52:52.261291-05:00 c0-0c0s7n1 [<ffffffffa0754e37>] lbug_with_loc+0x47/0xc0 [libcfs]
2013-06-10T09:52:52.286133-05:00 c0-0c0s7n1 [<ffffffffa0c8e7ac>] ll_dops_init+0x3cc/0x560 [lustre]
2013-06-10T09:52:52.286416-05:00 c0-0c0s7n1 [<ffffffffa0ccd2af>] ll_iget_for_nfs+0x2ff/0x390 [lustre]
2013-06-10T09:52:52.311470-05:00 c0-0c0s7n1 [<ffffffffa0ccdae0>] ll_get_parent+0x410/0x830 [lustre]
2013-06-10T09:52:52.311692-05:00 c0-0c0s7n1 [<ffffffff81253ce0>] reconnect_path+0x140/0x2d0
2013-06-10T09:52:52.311799-05:00 c0-0c0s7n1 [<ffffffff81254036>] exportfs_decode_fh+0xa6/0x280
2013-06-10T09:52:52.311913-05:00 c0-0c0s7n1 [<ffffffff81257c33>] fh_verify+0x353/0x6b0
2013-06-10T09:52:52.311958-05:00 c0-0c0s7n1 [<ffffffff812589f9>] nfsd_access+0x39/0x130
2013-06-10T09:52:52.336898-05:00 c0-0c0s7n1 [<ffffffff81261e3f>] nfsd3_proc_access+0x7f/0xe0
2013-06-10T09:52:52.337073-05:00 c0-0c0s7n1 [<ffffffff812545db>] nfsd_dispatch+0xbb/0x260
2013-06-10T09:52:52.362097-05:00 c0-0c0s7n1 [<ffffffff81491a8b>] svc_process+0x4ab/0x7a0
2013-06-10T09:52:52.362253-05:00 c0-0c0s7n1 [<ffffffff81254d75>] nfsd+0xd5/0x150
2013-06-10T09:52:52.362356-05:00 c0-0c0s7n1 [<ffffffff81068e0e>] kthread+0x9e/0xb0
2013-06-10T09:52:52.362547-05:00 c0-0c0s7n1 [<ffffffff814cfed4>] kernel_thread_helper+0x4/0x10

Here's the code in ll_dops_init:

int ll_dops_init(struct dentry *de, int block, int init_sa)
{
struct ll_dentry_data *lld = ll_d2d(de);
int rc = 0;

if (lld == NULL && block != 0)

{ rc = ll_set_dd(de); if (rc) return rc; lld = ll_d2d(de); }

if (lld != NULL && init_sa != 0)
lld->lld_sa_generation = 0;

#ifdef HAVE_DCACHE_LOCK
de->d_op = &ll_d_ops;
#else
/* kernel >= 2.6.38 d_op is set in d_alloc() */
LASSERT(de->d_op == &ll_d_ops);
#endif
return rc;

I've investigated the crash dump and found that the d_op pointer is set to ll_d_root_ops, rather than ll_d_ops.
So I checked the dentry in question, and it IS the root dentry, which means it's correct that the dentry operations would be ll_d_root_ops.

d_obtain_alias (replacement for d_alloc) only sets ll_d_ops as described in the comment above when it is creating an anonymous dentry (done when it can't find any aliases for the inode). Presumably, the root dentry would already have an alias, which is why it's not getting set.

Prior to 2.6.38, d_op is set directly here to ll_d_ops.

That suggests a few possible issues, with varying fixes:
1) The assertion is wrong and it's OK for the dentry operations to be ll_d_root_ops in this case.
2) The root dentry should never make it here, something else is wrong. (What?)
3) It's not OK for the dentry operations to be ll_d_root_ops and we need to set them to ll_d_ops here. (But if so, why have ll_d_root_ops? This seems incorrect.)



 Comments   
Comment by Lai Siyao [ 24/Jun/13 ]

I'm okay with the option 2. Because lustre root dentry won't be really revalidated, ll_dops_init() should not be called for it.

But I don't know why root dentry is handled different from other dentries, Oleg, could you give some comment?

Comment by Patrick Farrell (Inactive) [ 24/Jun/13 ]

Lai,

With option 2, I'm saying if the root dentry shouldn't have ll_dops_init called on it, I'm not sure what we should change to avoid that.
Is it as simple as putting a check in ll_iget_for_nfs to see if it's working with the root dentry, and then not calling ll_dops_init in that case?

Comment by Lai Siyao [ 24/Jun/13 ]

Patrick, yes, it should work in this way, because currently we handle root dentry differently. But if we can make sure root dentry is no different from others, we can get rid of ll_d_root_ops, and make dentry handling more consistent and simpler.

Comment by Patrick Farrell (Inactive) [ 24/Jun/13 ]

Lai,

OK. It's worth noting that in kernel versions earlier than 2.6.38, ll_dops_init was setting the d_op pointer, since it wasn't set in d_obtain_alias in the kernel. So presumably, it was resetting the root dentry d_ops pointer from ll_d_root_ops to ll_d_ops.

That suggests it's safe to not have the special ll_d_root_ops struct. The only different is that some operations are not defined in the root dentry ops.

Just for reference, here are the two sets of operations:

static struct dentry_operations ll_d_root_ops = {
        .d_compare = ll_dcompare,
        .d_revalidate = ll_revalidate_nd,
};

struct dentry_operations ll_d_ops = {
        .d_revalidate = ll_revalidate_nd,
        .d_release = ll_release,
        .d_delete  = ll_ddelete,
        .d_iput    = ll_d_iput,
        .d_compare = ll_dcompare,
};
Comment by Lai Siyao [ 25/Jun/13 ]

Yes, I noticed this, that's why I tend to remove ll_d_root_ops, and treat root dentry as normal ones. I'll make a patch to test.

Comment by Lai Siyao [ 27/Jun/13 ]

Patch is on http://review.whamcloud.com/#/c/6797/

Comment by Lai Siyao [ 06/Aug/13 ]

patch landed.

Comment by James A Simmons [ 10/Jun/14 ]

Patch http://review.whamcloud.com/#/c/6797 has been merged to the upstream kernel as commit: 3ea8f3bcabe422c6b5778089ae0929c1028e58f8

Since this is the case then is ticket can be closed.

Generated at Sat Feb 10 01:34:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.