[LU-6939] nrs_tbf.c:155:nrs_tbf_cli_reset()) ASSER TION( cli->tc_rule == ((void *)0) ) failed Created: 01/Aug/15 Updated: 28/Jun/16 Resolved: 25/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Just hit this in sanity.sh 77f, running master with a couple of unrelaed patches. <4>[22133.708730] Lustre: DEBUG MARKER: == sanityn test 77f: check TBF JobID nrs policy == 04:53:19 (1438419199) <4>[22133.874062] LNet: 22701:0:(nidstrings.c:271:parse_nidrange()) can't parse nidrange: "iozone.500" <4>[22135.187825] Lustre: DEBUG MARKER: cancel_lru_locks osc start <0>[22135.320574] LustreError: 27616:0:(nrs_tbf.c:155:nrs_tbf_cli_reset()) ASSERTION( cli->tc_rule == ((void *)0) ) failed: <0>[22135.322270] LustreError: 27616:0:(nrs_tbf.c:155:nrs_tbf_cli_reset()) LBUG <0>[22135.326308] Kernel panic - not syncing: LBUG in interrupt. <0>[22135.326312] <4>[22135.327246] Pid: 27616, comm: ll_ost00_004 Tainted: P --------------- 2.6.32-rhe6.6-debug #1 <4>[22135.327671] Call Trace: <4>[22135.327914] [<ffffffff8151dcd9>] ? panic+0xa7/0x16f <4>[22135.328502] [<ffffffffa0c04ecd>] ? lbug_with_loc+0x8d/0xb0 [libcfs] <4>[22135.328893] [<ffffffffa05e6e68>] ? nrs_tbf_cli_reset+0xb8/0x120 [ptlrpc] <4>[22135.330302] [<ffffffffa05e7560>] ? nrs_tbf_res_get+0x270/0x300 [ptlrpc] <4>[22135.330302] [<ffffffff81041d01>] ? native_patch+0x151/0x180 <4>[22135.330302] [<ffffffffa05d8d26>] ? nrs_resource_get+0x56/0x110 [ptlrpc] <4>[22135.330302] [<ffffffffa05da41e>] ? nrs_resource_get_safe+0x8e/0x100 [ptlrpc] <4>[22135.330302] [<ffffffffa059b120>] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc] <4>[22135.330302] [<ffffffffa05dcf3b>] ? ptlrpc_nrs_req_hp_move+0x6b/0x210 [ptlrpc] <4>[22135.330302] [<ffffffffa05bfa95>] ? req_capsule_client_get+0x15/0x20 [ptlrpc] <4>[22135.330302] [<ffffffffa0575458>] ? ldlm_server_blocking_ast+0x228/0x8b0 [ptlrpc] <4>[22135.330302] [<ffffffffa05f8279>] ? tgt_blocking_ast+0x1b9/0x8c0 [ptlrpc] <4>[22135.330302] [<ffffffffa054823d>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc] <4>[22135.330302] [<ffffffffa058c7d7>] ? ptlrpc_set_wait+0x77/0x9d0 [ptlrpc] <4>[22135.330302] [<ffffffff811766c4>] ? kmem_cache_alloc_node_trace+0x144/0x210 <4>[22135.330302] [<ffffffffa0583a6f>] ? ptlrpc_prep_set+0x5f/0x290 [ptlrpc] <4>[22135.330302] [<ffffffff8109d704>] ? __init_waitqueue_head+0x24/0x40 <4>[22135.330302] [<ffffffffa0583af3>] ? ptlrpc_prep_set+0xe3/0x290 [ptlrpc] <4>[22135.330302] [<ffffffffa0548160>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc] <4>[22135.330302] [<ffffffffa0544eff>] ? ldlm_run_ast_work+0xcf/0x440 [ptlrpc] <4>[22135.330302] [<ffffffffa0562abf>] ? ldlm_process_extent_lock+0x1bf/0xab0 [ptlrpc] <4>[22135.330302] [<ffffffffa054b23e>] ? ldlm_lock_enqueue+0x3fe/0x860 [ptlrpc] <4>[22135.330302] [<ffffffffa0576a57>] ? ldlm_handle_enqueue0+0x7e7/0x1520 [ptlrpc] <4>[22135.330302] [<ffffffffa05fcfe1>] ? tgt_enqueue+0x61/0x230 [ptlrpc] <4>[22135.330302] [<ffffffffa05fdbf2>] ? tgt_request_handle+0xa42/0x1230 [ptlrpc] <4>[22135.330302] [<ffffffffa05a94a4>] ? ptlrpc_main+0xd74/0x1850 [ptlrpc] <4>[22135.330302] [<ffffffffa05a8730>] ? ptlrpc_main+0x0/0x1850 [ptlrpc] <4>[22135.330302] [<ffffffff8109ce4e>] ? kthread+0x9e/0xc0 <4>[22135.330302] [<ffffffff8100c24a>] ? child_rip+0xa/0x20 <4>[22135.330302] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[22135.330302] [<ffffffff8100c240>] ? child_rip+0x0/0x20 crashdump is in /exports/crashdumps/192.168.10.224-2015-08-01-04\:53\:23/ |
| Comments |
| Comment by Andreas Dilger [ 07/Aug/15 ] |
|
Not sure if this is related to |
| Comment by Andreas Dilger [ 07/Aug/15 ] |
|
Li Xi, Wang Shilong, another failure from the new TBF test. |
| Comment by James Nunez (Inactive) [ 07/Aug/15 ] |
|
We've hit this once in review-zfs-part-1. Logs are at: |
| Comment by Bob Glossman (Inactive) [ 09/Oct/15 ] |
|
another seen on master: |
| Comment by James Nunez (Inactive) [ 16/Oct/15 ] |
|
Just hit this same LBUG with sanityn test_77e. Logs are at https://testing.hpdd.intel.com/test_sets/14ec05ac-73bb-11e5-8f32-5254006e85c2 |
| Comment by James Nunez (Inactive) [ 04/Nov/15 ] |
|
More failures on master. Logs at |
| Comment by Jian Yu [ 19/Nov/15 ] |
|
Another instance on master branch: |
| Comment by Gerrit Updater [ 15/Dec/15 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/17596 |
| Comment by Emoly Liu [ 31/Dec/15 ] |
|
Another failure: |
| Comment by Gerrit Updater [ 25/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17596/ |
| Comment by Joseph Gmitter (Inactive) [ 25/Jan/16 ] |
|
Landed for 2.8.0 |
| Comment by Bob Glossman (Inactive) [ 28/Jun/16 ] |
|
on b2_7_fe: |