Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3486

LBUG when exporting Lustre 2.4 via NFS on SLES11SP2: ll_dops_init: ASSERTION( de->d_op == &ll_d_ops ) failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • SLES11SP2 patchless client exporting the file system over NFS
    • 3
    • 8767

    Description

      As noted in LU-3483 and LU-3484, we're attempting to export Lustre 2.4 via NFS on SLES11SP2. This is the third of three issues we've found while doing so.

      When attempting to do a mkdir (in the root of the exported file system, from the nfs client, with the nfs server a patchless SLES11SP2 2.4 client) and then an ls -la of this new directory, we got the following:

      2013-06-10T09:52:52.210323-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) ASSERTION( de->d_op == &ll_d_ops ) failed:
      2013-06-10T09:52:52.235599-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) LBUG
      2013-06-10T09:52:52.235681-05:00 c0-0c0s7n1 Pid: 17310, comm: nfsd
      2013-06-10T09:52:52.235765-05:00 c0-0c0s7n1 Call Trace:
      2013-06-10T09:52:52.235969-05:00 c0-0c0s7n1 [<ffffffff81005da9>] try_stack_unwind+0x169/0x1b0
      2013-06-10T09:52:52.260723-05:00 c0-0c0s7n1 [<ffffffff81004849>] dump_trace+0x89/0x450
      2013-06-10T09:52:52.261059-05:00 c0-0c0s7n1 [<ffffffffa07548d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
      2013-06-10T09:52:52.261291-05:00 c0-0c0s7n1 [<ffffffffa0754e37>] lbug_with_loc+0x47/0xc0 [libcfs]
      2013-06-10T09:52:52.286133-05:00 c0-0c0s7n1 [<ffffffffa0c8e7ac>] ll_dops_init+0x3cc/0x560 [lustre]
      2013-06-10T09:52:52.286416-05:00 c0-0c0s7n1 [<ffffffffa0ccd2af>] ll_iget_for_nfs+0x2ff/0x390 [lustre]
      2013-06-10T09:52:52.311470-05:00 c0-0c0s7n1 [<ffffffffa0ccdae0>] ll_get_parent+0x410/0x830 [lustre]
      2013-06-10T09:52:52.311692-05:00 c0-0c0s7n1 [<ffffffff81253ce0>] reconnect_path+0x140/0x2d0
      2013-06-10T09:52:52.311799-05:00 c0-0c0s7n1 [<ffffffff81254036>] exportfs_decode_fh+0xa6/0x280
      2013-06-10T09:52:52.311913-05:00 c0-0c0s7n1 [<ffffffff81257c33>] fh_verify+0x353/0x6b0
      2013-06-10T09:52:52.311958-05:00 c0-0c0s7n1 [<ffffffff812589f9>] nfsd_access+0x39/0x130
      2013-06-10T09:52:52.336898-05:00 c0-0c0s7n1 [<ffffffff81261e3f>] nfsd3_proc_access+0x7f/0xe0
      2013-06-10T09:52:52.337073-05:00 c0-0c0s7n1 [<ffffffff812545db>] nfsd_dispatch+0xbb/0x260
      2013-06-10T09:52:52.362097-05:00 c0-0c0s7n1 [<ffffffff81491a8b>] svc_process+0x4ab/0x7a0
      2013-06-10T09:52:52.362253-05:00 c0-0c0s7n1 [<ffffffff81254d75>] nfsd+0xd5/0x150
      2013-06-10T09:52:52.362356-05:00 c0-0c0s7n1 [<ffffffff81068e0e>] kthread+0x9e/0xb0
      2013-06-10T09:52:52.362547-05:00 c0-0c0s7n1 [<ffffffff814cfed4>] kernel_thread_helper+0x4/0x10

      Here's the code in ll_dops_init:

      int ll_dops_init(struct dentry *de, int block, int init_sa)
      {
      struct ll_dentry_data *lld = ll_d2d(de);
      int rc = 0;

      if (lld == NULL && block != 0)

      { rc = ll_set_dd(de); if (rc) return rc; lld = ll_d2d(de); }

      if (lld != NULL && init_sa != 0)
      lld->lld_sa_generation = 0;

      #ifdef HAVE_DCACHE_LOCK
      de->d_op = &ll_d_ops;
      #else
      /* kernel >= 2.6.38 d_op is set in d_alloc() */
      LASSERT(de->d_op == &ll_d_ops);
      #endif
      return rc;

      I've investigated the crash dump and found that the d_op pointer is set to ll_d_root_ops, rather than ll_d_ops.
      So I checked the dentry in question, and it IS the root dentry, which means it's correct that the dentry operations would be ll_d_root_ops.

      d_obtain_alias (replacement for d_alloc) only sets ll_d_ops as described in the comment above when it is creating an anonymous dentry (done when it can't find any aliases for the inode). Presumably, the root dentry would already have an alias, which is why it's not getting set.

      Prior to 2.6.38, d_op is set directly here to ll_d_ops.

      That suggests a few possible issues, with varying fixes:
      1) The assertion is wrong and it's OK for the dentry operations to be ll_d_root_ops in this case.
      2) The root dentry should never make it here, something else is wrong. (What?)
      3) It's not OK for the dentry operations to be ll_d_root_ops and we need to set them to ll_d_ops here. (But if so, why have ll_d_root_ops? This seems incorrect.)

      Attachments

        Issue Links

          Activity

            [LU-3486] LBUG when exporting Lustre 2.4 via NFS on SLES11SP2: ll_dops_init: ASSERTION( de->d_op == &ll_d_ops ) failed

            Patch http://review.whamcloud.com/#/c/6797 has been merged to the upstream kernel as commit: 3ea8f3bcabe422c6b5778089ae0929c1028e58f8

            Since this is the case then is ticket can be closed.

            simmonsja James A Simmons added a comment - Patch http://review.whamcloud.com/#/c/6797 has been merged to the upstream kernel as commit: 3ea8f3bcabe422c6b5778089ae0929c1028e58f8 Since this is the case then is ticket can be closed.
            laisiyao Lai Siyao added a comment -

            patch landed.

            laisiyao Lai Siyao added a comment - patch landed.
            laisiyao Lai Siyao added a comment - Patch is on http://review.whamcloud.com/#/c/6797/
            laisiyao Lai Siyao added a comment -

            Yes, I noticed this, that's why I tend to remove ll_d_root_ops, and treat root dentry as normal ones. I'll make a patch to test.

            laisiyao Lai Siyao added a comment - Yes, I noticed this, that's why I tend to remove ll_d_root_ops, and treat root dentry as normal ones. I'll make a patch to test.

            Lai,

            OK. It's worth noting that in kernel versions earlier than 2.6.38, ll_dops_init was setting the d_op pointer, since it wasn't set in d_obtain_alias in the kernel. So presumably, it was resetting the root dentry d_ops pointer from ll_d_root_ops to ll_d_ops.

            That suggests it's safe to not have the special ll_d_root_ops struct. The only different is that some operations are not defined in the root dentry ops.

            Just for reference, here are the two sets of operations:

            static struct dentry_operations ll_d_root_ops = {
                    .d_compare = ll_dcompare,
                    .d_revalidate = ll_revalidate_nd,
            };
            

            struct dentry_operations ll_d_ops = {
                    .d_revalidate = ll_revalidate_nd,
                    .d_release = ll_release,
                    .d_delete  = ll_ddelete,
                    .d_iput    = ll_d_iput,
                    .d_compare = ll_dcompare,
            };
            
            paf Patrick Farrell (Inactive) added a comment - Lai, OK. It's worth noting that in kernel versions earlier than 2.6.38, ll_dops_init was setting the d_op pointer, since it wasn't set in d_obtain_alias in the kernel. So presumably, it was resetting the root dentry d_ops pointer from ll_d_root_ops to ll_d_ops. That suggests it's safe to not have the special ll_d_root_ops struct. The only different is that some operations are not defined in the root dentry ops. Just for reference, here are the two sets of operations: static struct dentry_operations ll_d_root_ops = { .d_compare = ll_dcompare, .d_revalidate = ll_revalidate_nd, }; — struct dentry_operations ll_d_ops = { .d_revalidate = ll_revalidate_nd, .d_release = ll_release, .d_delete = ll_ddelete, .d_iput = ll_d_iput, .d_compare = ll_dcompare, };
            laisiyao Lai Siyao added a comment -

            Patrick, yes, it should work in this way, because currently we handle root dentry differently. But if we can make sure root dentry is no different from others, we can get rid of ll_d_root_ops, and make dentry handling more consistent and simpler.

            laisiyao Lai Siyao added a comment - Patrick, yes, it should work in this way, because currently we handle root dentry differently. But if we can make sure root dentry is no different from others, we can get rid of ll_d_root_ops, and make dentry handling more consistent and simpler.

            Lai,

            With option 2, I'm saying if the root dentry shouldn't have ll_dops_init called on it, I'm not sure what we should change to avoid that.
            Is it as simple as putting a check in ll_iget_for_nfs to see if it's working with the root dentry, and then not calling ll_dops_init in that case?

            paf Patrick Farrell (Inactive) added a comment - Lai, With option 2, I'm saying if the root dentry shouldn't have ll_dops_init called on it, I'm not sure what we should change to avoid that. Is it as simple as putting a check in ll_iget_for_nfs to see if it's working with the root dentry, and then not calling ll_dops_init in that case?
            laisiyao Lai Siyao added a comment -

            I'm okay with the option 2. Because lustre root dentry won't be really revalidated, ll_dops_init() should not be called for it.

            But I don't know why root dentry is handled different from other dentries, Oleg, could you give some comment?

            laisiyao Lai Siyao added a comment - I'm okay with the option 2. Because lustre root dentry won't be really revalidated, ll_dops_init() should not be called for it. But I don't know why root dentry is handled different from other dentries, Oleg, could you give some comment?

            People

              laisiyao Lai Siyao
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: