Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
lustre-2.8.0_0.0.llnlpreview.41-2.ch6.x86_64
-
3
-
9223372036854775807
Description
While the filesystem was mounted and active, we began power cycling OSS's to verify failover worked properly.
Several OSS nodes crashed, with traces like this one:
2016-09-30 14:40:14 [11785.975211] BUG: unable to handle kernel paging request at 00000000deadbeef 2016-09-30 14:40:14 [11785.984272] IP: [<ffffffff81334259>] memset+0x9/0xb0 2016-09-30 14:40:14 [11785.986998] LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) ASSERTION( bd->bd_bucket->hsb_count > 0 ) failed: 2016-09-30 14:40:14 [11785.986999] LustreError: 106058:0:(hash.c:554:cfs_hash_bd_del_locked()) LBUG 2016-09-30 14:40:14 [11785.987000] Pid: 106058, comm: ldlm_bl_10 2016-09-30 14:40:14 [11785.987000] 2016-09-30 14:40:14 [11786.490434] Call Trace: 2016-09-30 14:40:14 [11786.493875] [<ffffffffa0d2e00d>] ? ofd_lvbo_free+0x4d/0xe0 [ofd] 2016-09-30 14:40:14 [11786.501409] [<ffffffffa1099643>] ldlm_resource_putref_locked+0x133/0x430 [ptlrpc] 2016-09-30 14:40:14 [11786.510593] [<ffffffffa1099952>] ldlm_res_hop_put_locked+0x12/0x20 [ptlrpc] 2016-09-30 14:40:14 [11786.519183] [<ffffffffa08d1b74>] cfs_hash_for_each_relax+0x1b4/0x3d0 [libcfs] 2016-09-30 14:40:14 [11786.527976] [<ffffffffa1096d60>] ? cleanup_resource+0x370/0x370 [ptlrpc] 2016-09-30 14:40:14 [11786.536291] [<ffffffffa1096d60>] ? cleanup_resource+0x370/0x370 [ptlrpc] 2016-09-30 14:40:14 [11786.544587] [<ffffffffa08d4dc5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] 2016-09-30 14:40:14 [11786.553382] [<ffffffffa1094eb0>] ldlm_namespace_cleanup+0x30/0xc0 [ptlrpc] 2016-09-30 14:40:14 [11786.561886] [<ffffffffa1095d5f>] __ldlm_namespace_free+0x5f/0x5c0 [ptlrpc] 2016-09-30 14:40:14 [11786.570385] [<ffffffffa0c660e4>] ? lfsck_instance_find+0x74/0xb0 [lfsck] 2016-09-30 14:40:14 [11786.578678] [<ffffffff8169d015>] ? mutex_lock+0x25/0x42 2016-09-30 14:40:14 [11786.585329] [<ffffffffa0c6a0a8>] ? lfsck_stop+0x1b8/0x4f0 [lfsck] 2016-09-30 14:40:14 [11786.592951] [<ffffffff811e5fd6>] ? kmem_cache_alloc_trace+0x226/0x250 2016-09-30 14:40:14 [11786.600978] [<ffffffffa109631a>] ldlm_namespace_free_prior+0x5a/0x210 [ptlrpc] 2016-09-30 14:40:14 [11786.609869] [<ffffffffa0d1089a>] ofd_device_fini+0x8a/0x2a0 [ofd] 2016-09-30 14:40:14 [11786.617527] [<ffffffffa0a0a21c>] class_cleanup+0x8dc/0xd70 [obdclass] 2016-09-30 14:40:14 [11786.625561] [<ffffffffa0a0cbfc>] class_process_config+0x1e2c/0x2f70 [obdclass] 2016-09-30 14:40:14 [11786.634454] [<ffffffff811e5a63>] ? __kmalloc+0x233/0x280 2016-09-30 14:40:14 [11786.641219] [<ffffffffa0a0611b>] ? lustre_cfg_new+0x8b/0x400 [obdclass] 2016-09-30 14:40:14 [11786.649424] [<ffffffffa0a0de2f>] class_manual_cleanup+0xef/0x810 [obdclass] 2016-09-30 14:40:14 [11786.658007] [<ffffffffa0a3fece>] server_put_super+0x8de/0xcd0 [obdclass] 2016-09-30 14:40:14 [11786.666272] [<ffffffff81209572>] generic_shutdown_super+0x72/0xf0 2016-09-30 14:40:14 [11786.673841] [<ffffffff81209942>] kill_anon_super+0x12/0x20 2016-09-30 14:40:14 [11786.680720] [<ffffffffa0a11592>] lustre_kill_super+0x32/0x50 [obdclass] 2016-09-30 14:40:14 [11786.688840] [<ffffffff81209cf9>] deactivate_locked_super+0x49/0x60 2016-09-30 14:40:14 [11786.696457] [<ffffffff8120a2f6>] deactivate_super+0x46/0x60 2016-09-30 14:40:14 [11786.703375] [<ffffffff812282c5>] mntput_no_expire+0xc5/0x120 2016-09-30 14:40:14 [11786.710372] [<ffffffff81229440>] SyS_umount+0xa0/0x3b0 2016-09-30 14:40:14 [11786.716770] [<ffffffff816aa4c9>] system_call_fastpath+0x16/0x1b
Attachments
Issue Links
- duplicates
-
LU-6304 crash on umount in cleanup_resource
-
- Resolved
-
Activity
Link | Original: This issue is related to LDEV-341 [ LDEV-341 ] |
Resolution | New: Duplicate [ 3 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Link | Original: This issue is related to JFC-21 [ JFC-21 ] |
Link | New: This issue is related to LDEV-341 [ LDEV-341 ] |
Link | Original: This issue is related to JFC-10 [ JFC-10 ] |
Link | New: This issue is related to JFC-21 [ JFC-21 ] |
Link | New: This issue is related to JFC-10 [ JFC-10 ] |
This is believed to be a duplicate of
LU-6304which has been landed for 2.8.1