Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4595

lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.6.0
    • 3
    • 12555

    Description

      Running racer against today's master (2.5.55-4-gb6a1b94) on a single node with MDSCOUNT=4 and OSTCOUNT=2 I see these LBUGs during umount.

      This loop reproduced the LBUG after 3 iterations:

      cd ~/lustre-release
      export MDSCOUNT=4
      export MOUNT_2=y
      for ((i = 0; i < 10; i++)); do
        echo -e "\n\n\n########### $i $(date) ############\n\n\n"
        llmount.sh
        sh lustre/tests/racer.sh
        umount /mnt/lustre /mnt/lustre2
        umount /mnt/mds{1..4} /mnt/ost{1..2}
        llmountcleanup.sh
      done
      
      Lustre: DEBUG MARKER: == racer test complete, duration 314 sec == 10:26:08 (1391703968)
      Lustre: Unmounted lustre-client
      Lustre: Unmounted lustre-client
      Lustre: Failing over lustre-MDT0000
      Lustre: server umount lustre-MDT0000 complete
      LustreError: 11-0: lustre-MDT0000-lwp-MDT0001: Communicating with 0@lo, operation mds_disconnect failed with -107.
      Lustre: Failing over lustre-MDT0001
      Lustre: server umount lustre-MDT0001 complete
      Lustre: Failing over lustre-MDT0002
      LustreError: 3307:0:(lod_dev.c:711:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: 
      LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: 
      LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) LBUG
      Pid: 27074, comm: umount
      
      Call Trace:
       [<ffffffffa0968895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0968e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0b55cdf>] mdt_device_fini+0xd5f/0xda0 [mdt]
       [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
       [<ffffffffa0e532b3>] class_cleanup+0x573/0xd30 [obdclass]
       [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0e54fda>] class_process_config+0x156a/0x1ad0 [obdclass]
       [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
       [<ffffffffa0e556b9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
       [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0e8ea19>] server_put_super+0x8e9/0xe40 [obdclass]
       [<ffffffff81184c3b>] generic_shutdown_super+0x5b/0xe0
       [<ffffffff81184d26>] kill_anon_super+0x16/0x60
       [<ffffffffa0e57576>] lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff811854c7>] deactivate_super+0x57/0x80
       [<ffffffff811a375f>] mntput_no_expire+0xbf/0x110
       [<ffffffff811a41cb>] sys_umount+0x7b/0x3a0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      
      Kernel panic - not syncing: LBUG
      Pid: 27074, comm: umount Not tainted 2.6.32-358.18.1.el6.lustre.x86_64 #1
      Call Trace:
       [<ffffffff8150f018>] ? panic+0xa7/0x16f
       [<ffffffffa0968eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa0b55cdf>] ? mdt_device_fini+0xd5f/0xda0 [mdt]
       [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
       [<ffffffffa0e532b3>] ? class_cleanup+0x573/0xd30 [obdclass]
       [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0e54fda>] ? class_process_config+0x156a/0x1ad0 [obdclass]
       [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
       [<ffffffffa0e556b9>] ? class_manual_cleanup+0x179/0x6f0 [obdclass]
       [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0e8ea19>] ? server_put_super+0x8e9/0xe40 [obdclass]
       [<ffffffff81184c3b>] ? generic_shutdown_super+0x5b/0xe0
       [<ffffffff81184d26>] ? kill_anon_super+0x16/0x60
       [<ffffffffa0e57576>] ? lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff811854c7>] ? deactivate_super+0x57/0x80
       [<ffffffff811a375f>] ? mntput_no_expire+0xbf/0x110
       [<ffffffff811a41cb>] ? sys_umount+0x7b/0x3a0
       [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      
      crash> bt
      PID: 27074  TASK: ffff880196477500  CPU: 1   COMMAND: "umount"
       #0 [ffff8801beb939b0] machine_kexec at ffffffff81035d6b
       #1 [ffff8801beb93a10] crash_kexec at ffffffff810c0e22
       #2 [ffff8801beb93ae0] panic at ffffffff8150f01f
       #3 [ffff8801beb93b60] lbug_with_loc at ffffffffa0968eeb [libcfs]
       #4 [ffff8801beb93b80] mdt_device_fini at ffffffffa0b55cdf [mdt]
       #5 [ffff8801beb93bf0] class_cleanup at ffffffffa0e532b3 [obdclass]
       #6 [ffff8801beb93c70] class_process_config at ffffffffa0e54fda [obdclass]
       #7 [ffff8801beb93d00] class_manual_cleanup at ffffffffa0e556b9 [obdclass]
       #8 [ffff8801beb93dc0] server_put_super at ffffffffa0e8ea19 [obdclass]
       #9 [ffff8801beb93e30] generic_shutdown_super at ffffffff81184c3b
      #10 [ffff8801beb93e50] kill_anon_super at ffffffff81184d26
      #11 [ffff8801beb93e70] lustre_kill_super at ffffffffa0e57576 [obdclass]
      #12 [ffff8801beb93e90] deactivate_super at ffffffff811854c7
      #13 [ffff8801beb93eb0] mntput_no_expire at ffffffff811a375f
      #14 [ffff8801beb93ee0] sys_umount at ffffffff811a41cb
      #15 [ffff8801beb93f80] system_call_fastpath at ffffffff8100b072
          RIP: 00007ff7634689a7  RSP: 00007fff84d7d120  RFLAGS: 00010202
          RAX: 00000000000000a6  RBX: ffffffff8100b072  RCX: 00007ff763d55009
          RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00007ff765c4cb90
          RBP: 00007ff765c4cb70   R8: 0000000000000000   R9: 0000000000000000
          R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
          R13: 0000000000000000  R14: 0000000000000000  R15: 00007ff765c4cbf0
          ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
      

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: