Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3486

LBUG when exporting Lustre 2.4 via NFS on SLES11SP2: ll_dops_init: ASSERTION( de->d_op == &ll_d_ops ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • SLES11SP2 patchless client exporting the file system over NFS
    • 3
    • 8767

    Description

      As noted in LU-3483 and LU-3484, we're attempting to export Lustre 2.4 via NFS on SLES11SP2. This is the third of three issues we've found while doing so.

      When attempting to do a mkdir (in the root of the exported file system, from the nfs client, with the nfs server a patchless SLES11SP2 2.4 client) and then an ls -la of this new directory, we got the following:

      2013-06-10T09:52:52.210323-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) ASSERTION( de->d_op == &ll_d_ops ) failed:
      2013-06-10T09:52:52.235599-05:00 c0-0c0s7n1 LustreError: 17310:0:(dcache.c:256:ll_dops_init()) LBUG
      2013-06-10T09:52:52.235681-05:00 c0-0c0s7n1 Pid: 17310, comm: nfsd
      2013-06-10T09:52:52.235765-05:00 c0-0c0s7n1 Call Trace:
      2013-06-10T09:52:52.235969-05:00 c0-0c0s7n1 [<ffffffff81005da9>] try_stack_unwind+0x169/0x1b0
      2013-06-10T09:52:52.260723-05:00 c0-0c0s7n1 [<ffffffff81004849>] dump_trace+0x89/0x450
      2013-06-10T09:52:52.261059-05:00 c0-0c0s7n1 [<ffffffffa07548d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
      2013-06-10T09:52:52.261291-05:00 c0-0c0s7n1 [<ffffffffa0754e37>] lbug_with_loc+0x47/0xc0 [libcfs]
      2013-06-10T09:52:52.286133-05:00 c0-0c0s7n1 [<ffffffffa0c8e7ac>] ll_dops_init+0x3cc/0x560 [lustre]
      2013-06-10T09:52:52.286416-05:00 c0-0c0s7n1 [<ffffffffa0ccd2af>] ll_iget_for_nfs+0x2ff/0x390 [lustre]
      2013-06-10T09:52:52.311470-05:00 c0-0c0s7n1 [<ffffffffa0ccdae0>] ll_get_parent+0x410/0x830 [lustre]
      2013-06-10T09:52:52.311692-05:00 c0-0c0s7n1 [<ffffffff81253ce0>] reconnect_path+0x140/0x2d0
      2013-06-10T09:52:52.311799-05:00 c0-0c0s7n1 [<ffffffff81254036>] exportfs_decode_fh+0xa6/0x280
      2013-06-10T09:52:52.311913-05:00 c0-0c0s7n1 [<ffffffff81257c33>] fh_verify+0x353/0x6b0
      2013-06-10T09:52:52.311958-05:00 c0-0c0s7n1 [<ffffffff812589f9>] nfsd_access+0x39/0x130
      2013-06-10T09:52:52.336898-05:00 c0-0c0s7n1 [<ffffffff81261e3f>] nfsd3_proc_access+0x7f/0xe0
      2013-06-10T09:52:52.337073-05:00 c0-0c0s7n1 [<ffffffff812545db>] nfsd_dispatch+0xbb/0x260
      2013-06-10T09:52:52.362097-05:00 c0-0c0s7n1 [<ffffffff81491a8b>] svc_process+0x4ab/0x7a0
      2013-06-10T09:52:52.362253-05:00 c0-0c0s7n1 [<ffffffff81254d75>] nfsd+0xd5/0x150
      2013-06-10T09:52:52.362356-05:00 c0-0c0s7n1 [<ffffffff81068e0e>] kthread+0x9e/0xb0
      2013-06-10T09:52:52.362547-05:00 c0-0c0s7n1 [<ffffffff814cfed4>] kernel_thread_helper+0x4/0x10

      Here's the code in ll_dops_init:

      int ll_dops_init(struct dentry *de, int block, int init_sa)
      {
      struct ll_dentry_data *lld = ll_d2d(de);
      int rc = 0;

      if (lld == NULL && block != 0)

      { rc = ll_set_dd(de); if (rc) return rc; lld = ll_d2d(de); }

      if (lld != NULL && init_sa != 0)
      lld->lld_sa_generation = 0;

      #ifdef HAVE_DCACHE_LOCK
      de->d_op = &ll_d_ops;
      #else
      /* kernel >= 2.6.38 d_op is set in d_alloc() */
      LASSERT(de->d_op == &ll_d_ops);
      #endif
      return rc;

      I've investigated the crash dump and found that the d_op pointer is set to ll_d_root_ops, rather than ll_d_ops.
      So I checked the dentry in question, and it IS the root dentry, which means it's correct that the dentry operations would be ll_d_root_ops.

      d_obtain_alias (replacement for d_alloc) only sets ll_d_ops as described in the comment above when it is creating an anonymous dentry (done when it can't find any aliases for the inode). Presumably, the root dentry would already have an alias, which is why it's not getting set.

      Prior to 2.6.38, d_op is set directly here to ll_d_ops.

      That suggests a few possible issues, with varying fixes:
      1) The assertion is wrong and it's OK for the dentry operations to be ll_d_root_ops in this case.
      2) The root dentry should never make it here, something else is wrong. (What?)
      3) It's not OK for the dentry operations to be ll_d_root_ops and we need to set them to ll_d_ops here. (But if so, why have ll_d_root_ops? This seems incorrect.)

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: