Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8508

kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Lustre DNE2 Testing, noticed some issue with latest master builds. When mounting storage targets on servers other than ones with the MGT i get a kernel panic with the below; I have validated this is not (to the best of my ability) network, I have also tried and FE build which works and another master build (3419) which works:

       
      [root@zlfs2-oss1 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/MDT0000
      arg[0] = /sbin/mount.lustre
      arg[1] = -v
      arg[2] = -o
      arg[3] = rw
      arg[4] = /dev/nvme0n1
      arg[5] = /mnt/MDT0000
      source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/MDT0000
      options = rw
      checking for existing Lustre data: found
      Reading CONFIGS/mountdata
      Writing CONFIGS/mountdata
      mounting device /dev/nvme0n1 at /mnt/MDT0000, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-MDT0000,device=/dev/nvme0n1
      mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler'
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG
      
      Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ...
       kernel:Kernel panic - not syncing: LBUG
      

      Attached is some debugging / more info.

      Builds Tried:
      master b3424 - issues
      master b3423 - issues
      master b3420 - issues
      master b3419 - works
      fe 2.8 b18 - works

      Attachments

        Issue Links

          Activity

            [LU-8508] kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22004/
            Subject: LU-8508 nodemap: improve object handling in cache saving
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 45cb603b4352a73077dcc45ec2cdea403837a7ba

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22004/ Subject: LU-8508 nodemap: improve object handling in cache saving Project: fs/lustre-release Branch: master Current Patch Set: Commit: 45cb603b4352a73077dcc45ec2cdea403837a7ba
            pjones Peter Jones added a comment -

            Let's see how the second review goes to see whether the refresh is needed

            pjones Peter Jones added a comment - Let's see how the second review goes to see whether the refresh is needed

            Hey Peter,

            No problem. I made the changes, would it be better to upload them and face the tests again, or leave it as is?

            Thanks,
            Kit

            kit.westneat Kit Westneat (Inactive) added a comment - Hey Peter, No problem. I made the changes, would it be better to upload them and face the tests again, or leave it as is? Thanks, Kit
            pjones Peter Jones added a comment -

            Hi Kit

            I checked with Oleg and you are right - sorry about that - so I have requested a second reviewer so that we can get this landed

            Peter

            pjones Peter Jones added a comment - Hi Kit I checked with Oleg and you are right - sorry about that - so I have requested a second reviewer so that we can get this landed Peter

            Hey Peter,

            Are we talking about change 22004? I only see two style comments from Andreas. There are a few over 80 chars autocomments as well, but I thought we were ignoring those now to match the Linux style guide. I'll refresh it, but I want to make sure I'm not missing something.

            Thanks,
            Kit

            kit.westneat Kit Westneat (Inactive) added a comment - Hey Peter, Are we talking about change 22004? I only see two style comments from Andreas. There are a few over 80 chars autocomments as well, but I thought we were ignoring those now to match the Linux style guide. I'll refresh it, but I want to make sure I'm not missing something. Thanks, Kit
            pjones Peter Jones added a comment -

            Kit

            I think that at the moment a second reviewer is holding off in anticipation of another version being forthcoming given that there are quite a number of comments so I tihnk that it would be good to refresh it

            Peter

            pjones Peter Jones added a comment - Kit I think that at the moment a second reviewer is holding off in anticipation of another version being forthcoming given that there are quite a number of comments so I tihnk that it would be good to refresh it Peter

            Hey Peter,

            I wasn't planning on it since he +1'd it, unless there were other issues found, but I can if that's desired.

            • Kit
            kit.westneat Kit Westneat (Inactive) added a comment - Hey Peter, I wasn't planning on it since he +1'd it, unless there were other issues found, but I can if that's desired. Kit

            People

              kit.westneat Kit Westneat (Inactive)
              adam.j.roe Adam Roe (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: