Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3727

LBUG (llite_nfs.c:281:ll_get_parent()) ASSERTION(body->valid & OBD_MD_FLID) failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.1.5, Lustre 1.8.9, Lustre 2.4.1
    • 3
    • 9597

    Description

      At GE Global Research, we ran into an LBUG with a 1.8.9 client that is re-exporting 2.1.5 Lustre:

      Jul 31 10:26:46 scinfra3 kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
      Jul 31 10:26:46 scinfra3 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
      Jul 31 10:26:46 scinfra3 kernel: NFSD: starting 90-second grace period
      Jul 31 10:26:53 scinfra3 ntpd[8318]: synchronized to 3.40.208.30, stratum 2
      Jul 31 10:29:46 scinfra3 kernel: LustreError: 27396:0:(llite_nfs.c:281:ll_get_parent()) ASSERTION(body->valid & OBD_MD_FLID) failed
      Jul 31 10:29:46 scinfra3 kernel: LustreError: 27396:0:(llite_nfs.c:281:ll_get_parent()) LBUG
      Jul 31 10:29:46 scinfra3 kernel: Pid: 27396, comm: nfsd
      Jul 31 10:29:46 scinfra3 kernel:
      Jul 31 10g:29:46 scinfra3 kernel: Call Trace:
      Jul 31 10:29:46 scinfra3 kernel: [ ] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
      Jul 31 10:29:46 scinfra3 kernel: [ ] lbug_with_loc+0x7a/0xd0 [libcfs]
      Jul 31 10:29:46 scinfra3 kernel: [ ] tracefile_init+0x0/0x110 [libcfs]
      Jul 31 10:29:46 scinfra3 kernel: [ ] ll_get_parent+0x1e3/0x2b0 [lustre]
      Jul 31 10:29:46 scinfra3 kernel: [ ] ll_get_dentry+0x6b/0xe0 [lustre]
      Jul 31 10:29:46 scinfra3 kernel: [ ] mutex_lock+0xd/0x1d
      Jul 31 10:29:46 scinfra3 kernel: [ ] find_exported_dentry+0x241/0x486 [exportfs]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd_acceptable+0x0/0xdc [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] autoremove_wake_function+0x0/0x2e
      Jul 31 10:29:46 scinfra3 kernel: [ ] sunrpc_cache_lookup+0x4b/0x128 [sunrpc]
      Jul 31 10:29:46 scinfra3 kernel: [ ] exp_get_by_name+0x5b/0x71 [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] exp_find_key+0x89/0x9c [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd_acceptable+0x0/0xdc [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] ll_decode_fh+0x197/0x240 [lustre]
      Jul 31 10:29:46 scinfra3 kernel: [ ] set_current_groups+0x116/0x164
      Jul 31 10:29:46 scinfra3 kernel: [ ] fh_verify+0x29c/0x4cf [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd3_proc_getattr+0x8a/0xbe [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd_dispatch+0xd8/0x1d6 [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] svc_process+0x3f8/0x6bf [sunrpc]
      Jul 31 10:29:46 scinfra3 kernel: [ ] __down_read+0x12/0x92
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd+0x0/0x2cb [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd+0x1a5/0x2cb [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] child_rip+0xa/0x11
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd+0x0/0x2cb [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] nfsd+0x0/0x2cb [nfsd]
      Jul 31 10:29:46 scinfra3 kernel: [ ] child_rip+0x0/0x11
      Jul 31 10:29:46 scinfra3 kernel:

      It appears to be easily reproducible, we are going to try to get a core dump, but I was wondering if there was anything obvious from this trace or any other jira tickets I might have missed. Also is there any other information that might be useful?

      Thanks.

      Attachments

        1. unlink08.c
          10 kB
        2. lustre.log
          3.60 MB
        3. log.unlink08.lctl.dk.out.gz
          3.52 MB
        4. log.txt
          44 kB

        Issue Links

          Activity

            [LU-3727] LBUG (llite_nfs.c:281:ll_get_parent()) ASSERTION(body->valid & OBD_MD_FLID) failed

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/14498
            Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission
            Project: fs/lustre-release
            Branch: b2_3
            Current Patch Set: 1
            Commit: bc90ead229cdaa940cca1819fe6fbfb983506014

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/14498 Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission Project: fs/lustre-release Branch: b2_3 Current Patch Set: 1 Commit: bc90ead229cdaa940cca1819fe6fbfb983506014
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/13270
            Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: cffa66fed624973d71bfc2c3382f1ef0a19397d4

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/13270 Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: cffa66fed624973d71bfc2c3382f1ef0a19397d4

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/7327/
            Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a0b959c53d10bf3f0fd6b22de46397d0c7e5f667

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/7327/ Subject: LU-3727 nfs: Fix ll_get_parent() LBUG caused by permission Project: fs/lustre-release Branch: master Current Patch Set: Commit: a0b959c53d10bf3f0fd6b22de46397d0c7e5f667

            Patch of 'LU-3952 nfs: don't panic NFS server if MDS fails to find FID' helps to walk around the problem for master branch. But that patch does not help earlier versions such as b2_1. And I don't think the root cause has been fixed by that patch. Refreshed http://review.whamcloud.com/#/c/7327/ again.

            lixi Li Xi (Inactive) added a comment - Patch of ' LU-3952 nfs: don't panic NFS server if MDS fails to find FID' helps to walk around the problem for master branch. But that patch does not help earlier versions such as b2_1. And I don't think the root cause has been fixed by that patch. Refreshed http://review.whamcloud.com/#/c/7327/ again.

            Frederik - There's no movement towards a fix at the moment. If you're building your own Lustre, there's an option: Alexey and Oleg dislike http://review.whamcloud.com/#/c/7327/, but it does avoid the bug & we've been running it at Cray for a bit.

            paf Patrick Farrell (Inactive) added a comment - Frederik - There's no movement towards a fix at the moment. If you're building your own Lustre, there's an option: Alexey and Oleg dislike http://review.whamcloud.com/#/c/7327/ , but it does avoid the bug & we've been running it at Cray for a bit.

            Looks like we've just hit this as well on a NFS server/lustre client which is still running 1.8.9 after upgrading one file system to 2.5.2. We intend to upgrade the client to 2.5.2 as well ASAP but need to upgrade all file system first.

            Is there any indication that this might be fixed in 2.5.2?

            ferner Frederik Ferner (Inactive) added a comment - Looks like we've just hit this as well on a NFS server/lustre client which is still running 1.8.9 after upgrading one file system to 2.5.2. We intend to upgrade the client to 2.5.2 as well ASAP but need to upgrade all file system first. Is there any indication that this might be fixed in 2.5.2?

            Alexey, can you please describe your quesiton in detial here?

            ihara Shuichi Ihara (Inactive) added a comment - Alexey, can you please describe your quesiton in detial here?

            Hi Alexey,

            I am sorry, maybe because the lack of background knowledge, I don't understand the question well. Would you please explain a little bit about it? And do you have any specific problems about the patch?

            lixi Li Xi (Inactive) added a comment - Hi Alexey, I am sorry, maybe because the lack of background knowledge, I don't understand the question well. Would you please explain a little bit about it? And do you have any specific problems about the patch?

            any ability to answer ?

            shadow Alexey Lyashkov added a comment - any ability to answer ?

            People

              emoly.liu Emoly Liu
              orentas Oz Rentas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: