Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8663

LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) ASSERTION( bd->bd_bucket->hsb_count > 0 ) failed

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • lustre-2.8.0_0.0.llnlpreview.41-2.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      While the filesystem was mounted and active, we began power cycling OSS's to verify failover worked properly.

      Several OSS nodes crashed, with traces like this one:

      2016-09-30 14:40:14 [11785.975211] BUG: unable to handle kernel paging request at 00000000deadbeef
      2016-09-30 14:40:14 [11785.984272] IP: [<ffffffff81334259>] memset+0x9/0xb0
      2016-09-30 14:40:14 [11785.986998] LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) ASSERTION( bd->bd_bucket->hsb_count > 0 ) failed:
      2016-09-30 14:40:14 [11785.986999] LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) LBUG
      2016-09-30 14:40:14 [11785.987000] Pid: 106058, comm: ldlm_bl_10
      2016-09-30 14:40:14 [11785.987000]
      2016-09-30 14:40:14 [11786.490434] Call Trace:
      2016-09-30 14:40:14 [11786.493875]  [<ffffffffa0d2e00d>] ? ofd_lvbo_free+0x4d/0xe0 [ofd]
      2016-09-30 14:40:14 [11786.501409]  [<ffffffffa1099643>] ldlm_resource_putref_locked+0x133/0x430 [ptlrpc]
      2016-09-30 14:40:14 [11786.510593]  [<ffffffffa1099952>] ldlm_res_hop_put_locked+0x12/0x20 [ptlrpc]
      2016-09-30 14:40:14 [11786.519183]  [<ffffffffa08d1b74>] cfs_hash_for_each_relax+0x1b4/0x3d0 [libcfs]
      2016-09-30 14:40:14 [11786.527976]  [<ffffffffa1096d60>] ? cleanup_resource+0x370/0x370 [ptlrpc]
      2016-09-30 14:40:14 [11786.536291]  [<ffffffffa1096d60>] ? cleanup_resource+0x370/0x370 [ptlrpc]
      2016-09-30 14:40:14 [11786.544587]  [<ffffffffa08d4dc5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
      2016-09-30 14:40:14 [11786.553382]  [<ffffffffa1094eb0>] ldlm_namespace_cleanup+0x30/0xc0 [ptlrpc]
      2016-09-30 14:40:14 [11786.561886]  [<ffffffffa1095d5f>] __ldlm_namespace_free+0x5f/0x5c0 [ptlrpc]
      2016-09-30 14:40:14 [11786.570385]  [<ffffffffa0c660e4>] ? lfsck_instance_find+0x74/0xb0 [lfsck]
      2016-09-30 14:40:14 [11786.578678]  [<ffffffff8169d015>] ? mutex_lock+0x25/0x42
      2016-09-30 14:40:14 [11786.585329]  [<ffffffffa0c6a0a8>] ? lfsck_stop+0x1b8/0x4f0 [lfsck]
      2016-09-30 14:40:14 [11786.592951]  [<ffffffff811e5fd6>] ? kmem_cache_alloc_trace+0x226/0x250
      2016-09-30 14:40:14 [11786.600978]  [<ffffffffa109631a>] ldlm_namespace_free_prior+0x5a/0x210 [ptlrpc]
      2016-09-30 14:40:14 [11786.609869]  [<ffffffffa0d1089a>] ofd_device_fini+0x8a/0x2a0 [ofd]
      2016-09-30 14:40:14 [11786.617527]  [<ffffffffa0a0a21c>] class_cleanup+0x8dc/0xd70 [obdclass]
      2016-09-30 14:40:14 [11786.625561]  [<ffffffffa0a0cbfc>] class_process_config+0x1e2c/0x2f70 [obdclass]
      2016-09-30 14:40:14 [11786.634454]  [<ffffffff811e5a63>] ? __kmalloc+0x233/0x280
      2016-09-30 14:40:14 [11786.641219]  [<ffffffffa0a0611b>] ? lustre_cfg_new+0x8b/0x400 [obdclass]
      2016-09-30 14:40:14 [11786.649424]  [<ffffffffa0a0de2f>] class_manual_cleanup+0xef/0x810 [obdclass]
      2016-09-30 14:40:14 [11786.658007]  [<ffffffffa0a3fece>] server_put_super+0x8de/0xcd0 [obdclass]
      2016-09-30 14:40:14 [11786.666272]  [<ffffffff81209572>] generic_shutdown_super+0x72/0xf0
      2016-09-30 14:40:14 [11786.673841]  [<ffffffff81209942>] kill_anon_super+0x12/0x20
      2016-09-30 14:40:14 [11786.680720]  [<ffffffffa0a11592>] lustre_kill_super+0x32/0x50 [obdclass]
      2016-09-30 14:40:14 [11786.688840]  [<ffffffff81209cf9>] deactivate_locked_super+0x49/0x60
      2016-09-30 14:40:14 [11786.696457]  [<ffffffff8120a2f6>] deactivate_super+0x46/0x60
      2016-09-30 14:40:14 [11786.703375]  [<ffffffff812282c5>] mntput_no_expire+0xc5/0x120
      2016-09-30 14:40:14 [11786.710372]  [<ffffffff81229440>] SyS_umount+0xa0/0x3b0
      2016-09-30 14:40:14 [11786.716770]  [<ffffffff816aa4c9>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-8663] LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) ASSERTION( bd->bd_bucket->hsb_count > 0 ) failed
            mdiep Minh Diep made changes -
            Link Original: This issue is related to LDEV-341 [ LDEV-341 ]
            pjones Peter Jones made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            This is believed to be a duplicate of LU-6304 which has been landed for 2.8.1

            pjones Peter Jones added a comment - This is believed to be a duplicate of LU-6304 which has been landed for 2.8.1
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-21 [ JFC-21 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to LDEV-341 [ LDEV-341 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-10 [ JFC-10 ]
            pjones Peter Jones made changes -
            Link New: This issue duplicates LU-6304 [ LU-6304 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-21 [ JFC-21 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to JFC-10 [ JFC-10 ]
            green Oleg Drokin added a comment -

            I believe this is a dup of LU-6304 and the patch is here: http://review.whamcloud.com/13908

            green Oleg Drokin added a comment - I believe this is a dup of LU-6304 and the patch is here: http://review.whamcloud.com/13908

            People

              green Oleg Drokin
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: