Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.0
-
LLNL/Hyperion
-
3
-
8505
Description
Running SWL test with NRS policy 'orr' after 25 hours OSS had LBUG, there were multiple assertions during the initial stack dump:
2013-05-30 13:12:46 LustreError: 5770:0:(hash.c:546:cfs_hash_bd_del_locked()) ASSERTION( bd->bd_bucket->hsb_count > 0 ) failed: 2013-05-30 13:12:46 LustreError: 5770:0:(hash.c:546:cfs_hash_bd_del_locked()) LBUG 2013-05-30 13:12:46 Pid: 5770, comm: ll_ost_io00_077 2013-05-30 13:12:46 2013-05-30 13:12:46 Call Trace: 2013-05-30 13:12:46 [<ffffffffa04d1895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2013-05-30 13:12:46 May 30 13:12:46 [<ffffffffa04d1e97>] lbug_with_loc+0x47/0xb0 [libcfs] 2013-05-30 13:12:46 hyperion-dit33 k [<ffffffffa04e785a>] cfs_hash_bd_del_locked+0xda/0x140 [libcfs] 2013-05-30 13:12:46 ernel: LustreErr [<ffffffffa0a467e8>] nrs_orr_hop_put_free+0x218/0x290 [ptlrpc] 2013-05-30 13:12:46 or: 5770:0:(hash [<ffffffffa0a456d8>] nrs_orr_res_put+0x28/0x60 [ptlrpc] 2013-05-30 13:12:46 .c:546:cfs_hash_ [<ffffffffa0a3eb80>] nrs_resource_put_safe+0x60/0xf0 [ptlrpc] 2013-05-30 13:12:46 bd_del_locked()) [<ffffffffa0a3ec30>] ptlrpc_nrs_req_finalize+0x20/0x30 [ptlrpc] 2013-05-30 13:12:46 ASSERTION( bd->bd_bucket->hsb_c [<ffffffffa0a05a32>] ptlrpc_server_finish_active_request+0x62/0x150 [ptlrpc] 2013-05-30 13:12:46 ount > 0 ) faile [<ffffffffa0a0c1a2>] ptlrpc_server_handle_request+0x1b2/0xc60 [ptlrpc] 2013-05-30 13:12:46 d: 2013-05-30 13:12:46 May 30 13:12 [<ffffffffa04d25de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 2013-05-30 13:12:46 :46 hyperion-dit [<ffffffffa04e3d8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] 2013-05-30 13:12:46 33 kernel: Lustr [<ffffffffa0a036e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 2013-05-30 13:12:46 eError: 5770:0:( [<ffffffffa0a0d71e>] ptlrpc_main+0xace/0x1700 [ptlrpc] 2013-05-30 13:12:46 hash.c:546:cfs_h [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:46 ash_bd_del_locke [<ffffffff8100c0ca>] child_rip+0xa/0x20 2013-05-30 13:12:46 d()) LBUG 2013-05-30 13:12:46 [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:46 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 2013-05-30 13:12:46 2013-05-30 13:12:46 Kernel panic - not syncing: LBUG 2013-05-30 13:12:46 Pid: 5770, comm: ll_ost_io00_077 Tainted: P --------------- 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1 2013-05-30 13:12:46 Call Trace: 2013-05-30 13:12:46 [<ffffffff8150d878>] ? panic+0xa7/0x16f 2013-05-30 13:12:46 May 30 13:12:46 [<ffffffffa04d1eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2013-05-30 13:12:46 hyperion-dit33 k [<ffffffffa04e785a>] ? cfs_hash_bd_del_locked+0xda/0x140 [libcfs] 2013-05-30 13:12:46 ernel: Kernel pa [<ffffffffa0a467e8>] ? nrs_orr_hop_put_free+0x218/0x290 [ptlrpc] 2013-05-30 13:12:46 nic - not syncin [<ffffffffa0a456d8>] ? nrs_orr_res_put+0x28/0x60 [ptlrpc] 2013-05-30 13:12:46 g: LBUG 2013-05-30 13:12:46 [<ffffffffa0a3eb80>] ? nrs_resource_put_safe+0x60/0xf0 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa0a3ec30>] ? ptlrpc_nrs_req_finalize+0x20/0x30 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa0a05a32>] ? ptlrpc_server_finish_active_request+0x62/0x150 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa0a0c1a2>] ? ptlrpc_server_handle_request+0x1b2/0xc60 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa04d25de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 2013-05-30 13:12:46 [<ffffffffa04e3d8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] 2013-05-30 13:12:46 [<ffffffffa0a036e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 2013-05-30 13:12:46 [<ffffffffa0a0d71e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc] 2013-05-30 13:12:47 [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:47 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 2013-05-30 13:12:47 [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:47 [<ffffffffa0a0cc50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 2013-05-30 13:12:47 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 2013-05-30 13:12:47 Initializing cgroup subsys cpuset