Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2665

LBUG while unmounting client

    XMLWordPrintable

Details

    • 3
    • 6217

    Description

      Hi,

      When trying to unmount a Lustre client, we got the following problem:

      Lustre: DEBUG MARKER: Wed Nov 21 06:25:01 2012
      
      LustreError: 11559:0:(ldlm_lock.c:1697:ldlm_lock_cancel()) ### lock still has references ns:
      ptmp-MDT0000-mdc-ffff88030871bc00 lock: ffff88060dbd2d80/0x4618f3ec8d79d8be lrc: 4/0,1 mode: PW/PW res: 8590405073/266
      rrc: 2 type: FLK pid: 4414 [0->551] flags: 0x22002890 remote: 0xc8980c051f8f6afd expref: -99 pid: 4414 timeout: 0
      LustreError: 11559:0:(ldlm_lock.c:1698:ldlm_lock_cancel()) LBUG
      Pid: 11559, comm: umount
      
      Call Trace:
       [<ffffffffa040d7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa040de07>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa063343d>] ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc]
       [<ffffffffa064d245>] ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc]
       [<ffffffffa06510b8>] ldlm_cli_cancel+0x58/0x3b0 [ptlrpc]
       [<ffffffffa063ae18>] cleanup_resource+0x168/0x300 [ptlrpc]
       [<ffffffffa063afda>] ldlm_resource_clean+0x2a/0x50 [ptlrpc]
       [<ffffffffa041e28f>] cfs_hash_for_each_relax+0x17f/0x380 [libcfs]
       [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa041fcaf>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs]
       [<ffffffffa0637a69>] ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc]
       [<ffffffffa0638adb>] __ldlm_namespace_free+0x4b/0x540 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs]
       [<ffffffffa063903f>] ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc]
       [<ffffffffa063fc4c>] client_disconnect_export+0x23c/0x460 [ptlrpc]
       [<ffffffffa0b42a44>] lmv_disconnect+0x644/0xc70 [lmv]
       [<ffffffffa0a470bc>] client_common_put_super+0x46c/0xe80 [lustre]
       [<ffffffffa0a47ba0>] ll_put_super+0xd0/0x360 [lustre]
       [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140
       [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0
       [<ffffffff8116542b>] generic_shutdown_super+0x5b/0x110
       [<ffffffff81165546>] kill_anon_super+0x16/0x60
       [<ffffffffa050897a>] lustre_kill_super+0x4a/0x60 [obdclass]
       [<ffffffff811664e0>] deactivate_super+0x70/0x90
       [<ffffffff811826bf>] mntput_no_expire+0xbf/0x110
       [<ffffffff81183188>] sys_umount+0x78/0x3c0
       [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b
      
      Kernel panic - not syncing: LBUG
      Pid: 11559, comm: umount Not tainted 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 #1
      Call Trace:
       [<ffffffff81484650>] ? panic+0x78/0x143
       [<ffffffffa040de5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa063343d>] ? ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc]
       [<ffffffffa064d245>] ? ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc]
       [<ffffffffa06510b8>] ? ldlm_cli_cancel+0x58/0x3b0 [ptlrpc]
       [<ffffffffa063ae18>] ? cleanup_resource+0x168/0x300 [ptlrpc]
       [<ffffffffa063afda>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc]
       [<ffffffffa041e28f>] ? cfs_hash_for_each_relax+0x17f/0x380 [libcfs]
       [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa041fcaf>] ? cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs]
       [<ffffffffa0637a69>] ? ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc]
       [<ffffffffa0638adb>] ? __ldlm_namespace_free+0x4b/0x540 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc]
       [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs]
       [<ffffffffa063903f>] ? ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc]
       [<ffffffffa063fc4c>] ? client_disconnect_export+0x23c/0x460 [ptlrpc]
       [<ffffffffa0b42a44>] ? lmv_disconnect+0x644/0xc70 [lmv]
       [<ffffffffa0a470bc>] ? client_common_put_super+0x46c/0xe80 [lustre]
       [<ffffffffa0a47ba0>] ? ll_put_super+0xd0/0x360 [lustre]
       [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140
       [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0
       [<ffffffff8116542b>] ? generic_shutdown_super+0x5b/0x110
       [<ffffffff81165546>] ? kill_anon_super+0x16/0x60
       [<ffffffffa050897a>] ? lustre_kill_super+0x4a/0x60 [obdclass]
       [<ffffffff811664e0>] ? deactivate_super+0x70/0x90
       [<ffffffff811826bf>] ? mntput_no_expire+0xbf/0x110
       [<ffffffff81183188>] ? sys_umount+0x78/0x3c0
       [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
      

      This issue is exactly the same as the one described in LU-1429, which is a duplicate of LU-1328, which itself seems to be related to LU-1421.
      The issue seems to be resolved, but it is very unclear to me which patches are needed in order to completely fix the issue.
      I add that we need of fix for b2_1.

      Can you please advise?

      TIA,
      Sebastien.

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: