Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7678

LBUG on client reexporting on NFS with ASSERTION( it->d.lustre.it_status != 0 ) in mdc_finish_intent_lock()

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.1.6
    • RHEL 6 kernel 2.6.32-504.bl6.Bull.59 (with bull patches), lustre version 2.1.6 + a few patches.
    • 3
    • 9223372036854775807

    Description

      A lustre client, which is exporting the filesystem as an NFS server frequently hits this LBug:
      LustreError: 17502:0:(mdc_locks.c:797:mdc_finish_intent_lock()) ASSERTION( it->d.lustre.it_status != 0 ) failed:

      The backtrace of the process is then:
      crash> bt
      PID: 17502 TASK: ffff8808556f4080 CPU: 19 COMMAND: "nfsd"
      #0 [ffff88085aa535c8] machine_kexec at ffffffff81031dcb
      #1 [ffff88085aa53628] crash_kexec at ffffffff810b5652
      #2 [ffff88085aa536f8] panic at ffffffff814d4d5d
      #3 [ffff88085aa53778] lbug_with_loc at ffffffffa0557deb [libcfs]
      #4 [ffff88085aa53798] mdc_finish_intent_lock at ffffffffa0a1487a [mdc]
      #5 [ffff88085aa53858] mdc_intent_lock at ffffffffa0a17c48 [mdc]
      #6 [ffff88085aa53938] lmv_intent_open at ffffffffa0c8a920 [lmv]
      #7 [ffff88085aa53a38] lmv_intent_lock at ffffffffa0c8b980 [lmv]
      #8 [ffff88085aa53ac8] ll_intent_file_open at ffffffffa0b6e618 [lustre]
      #9 [ffff88085aa53b58] ll_file_open at ffffffffa0b6faad [lustre]
      #10 [ffff88085aa53c28] __dentry_open at ffffffff811784ca
      #11 [ffff88085aa53c88] dentry_open at ffffffff81178762
      #12 [ffff88085aa53cb8] nfsd_open at ffffffffa050f7ee [nfsd]
      #13 [ffff88085aa53d08] nfsd_write at ffffffffa050fc93 [nfsd]
      #14 [ffff88085aa53d68] nfsd3_proc_write at ffffffffa0518dbf [nfsd]
      #15 [ffff88085aa53dd8] nfsd_dispatch at ffffffffa0509425 [nfsd]
      #16 [ffff88085aa53e18] svc_process_common at ffffffffa03eff24 [sunrpc]
      #17 [ffff88085aa53e98] svc_process at ffffffffa03f0560 [sunrpc]
      #18 [ffff88085aa53eb8] nfsd at ffffffffa0509b52 [nfsd]
      #19 [ffff88085aa53ee8] kthread at ffffffff8108912e
      #20 [ffff88085aa53f48] kernel_thread at ffffffff810041ea

      Looking at the associated lookup_intent structure (it), we get this:
      crash> struct lookup_intent ffff88085aa53bc0
      struct lookup_intent {
      it_op = 1,
      it_flags = 578846722,
      it_create_mode = 33587200,
      d = {
      lustre =

      { it_disposition = 6, # DISP_LOOKUP_EXECD | DISP_LOOKUP_NEG it_status = 0, it_lock_handle = 0, it_data = 0xffff88085eb6d400, it_lock_mode = 0 }

      }
      }

      The ticket LU-3564 which was submitted in 2013, seems to be exactly the same issue in 2.1.5, but never made it out of Triage.
      Is there already a fix to this issue which we could backport ?

      I have a crash dump available if needed.

      Attachments

        Issue Links

          Activity

            [LU-7678] LBUG on client reexporting on NFS with ASSERTION( it->d.lustre.it_status != 0 ) in mdc_finish_intent_lock()
            pjones Peter Jones added a comment -

            Thanks Sebastien!

            pjones Peter Jones added a comment - Thanks Sebastien!

            Great !
            Thanks for the analysis.
            This ticket can be closed then.

            spiechurski Sebastien Piechurski added a comment - Great ! Thanks for the analysis. This ticket can be closed then.
            laisiyao Lai Siyao added a comment -

            This is a duplicate of LU-2523, and the fix http://review.whamcloud.com/#/c/5417 is in 2.5.

            laisiyao Lai Siyao added a comment - This is a duplicate of LU-2523 , and the fix http://review.whamcloud.com/#/c/5417 is in 2.5.

            Dumps and debuginfo files have been uploaded to the ftp site under /uploads/LU-7678.

            spiechurski Sebastien Piechurski added a comment - Dumps and debuginfo files have been uploaded to the ftp site under /uploads/ LU-7678 .

            Hi Peter,

            The site is IT4Innovation in Czech Republic.
            They have no plan to upgrade to my knowledge, but if this issue is shown to be fixed in 2.5, this will be an argument to move them from this current distribution which will be out of support in March.

            spiechurski Sebastien Piechurski added a comment - Hi Peter, The site is IT4Innovation in Czech Republic. They have no plan to upgrade to my knowledge, but if this issue is shown to be fixed in 2.5, this will be an argument to move them from this current distribution which will be out of support in March.
            pjones Peter Jones added a comment -

            Lai

            I know that you have worked on a few NFS issues recently. Do you recognize this at all?

            Sebastien

            Which is the site affected and do they have plans to move to a more current release soon?

            Peter

            pjones Peter Jones added a comment - Lai I know that you have worked on a few NFS issues recently. Do you recognize this at all? Sebastien Which is the site affected and do they have plans to move to a more current release soon? Peter

            Hello Seb,
            Can you provide/upload me the crash-dump, along with the kernel-[common-]debuginfo and lustre-debuginfo RPMs?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Seb, Can you provide/upload me the crash-dump, along with the kernel- [common-] debuginfo and lustre-debuginfo RPMs?

            People

              laisiyao Lai Siyao
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: