[LU-7282] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) ASSERTION( msg->msg_receiving ) failed Created: 11/Oct/15 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Just had this crash in sanity test 134a: Oct 11 14:20:08 centos6-14 kernel: [139674.309218] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) ASSERTION( msg->msg_receiving ) failed: Oct 11 14:20:08 centos6-14 kernel: [139674.310338] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) LBUG Oct 11 14:20:08 centos6-14 kernel: [139674.310909] Pid: 29399, comm: mdt00_003 Oct 11 14:20:08 centos6-14 kernel: [139674.311699] Oct 11 14:20:08 centos6-14 kernel: [139674.311700] Call Trace: Oct 11 14:20:08 centos6-14 kernel: [139674.312345] [<ffffffffa0ad6885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Oct 11 14:20:08 centos6-14 kernel: [139674.312897] [<ffffffffa0ad6e87>] lbug_with_loc+0x47/0xb0 [libcfs] Oct 11 14:20:08 centos6-14 kernel: [139674.313281] [<ffffffffa0cebdd0>] lnet_ni_eager_recv+0x1e0/0x220 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.314228] [<ffffffffa0cee5ad>] lnet_parse_local+0x54d/0xc50 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.314859] [<ffffffff8117757a>] ? cache_alloc_debugcheck_after+0x14a/0x210 Oct 11 14:20:08 centos6-14 kernel: [139674.315554] [<ffffffffa0cef37a>] lnet_parse+0x6ca/0xd20 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.316120] [<ffffffffa0cf014b>] lolnd_send+0x2b/0xa0 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.326047] [<ffffffffa0ce86eb>] lnet_ni_send+0x4b/0xf0 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.326683] [<ffffffffa0cecd63>] lnet_send+0x883/0xba0 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.327206] [<ffffffffa0cedb4c>] LNetPut+0x2fc/0x810 [lnet] Oct 11 14:20:08 centos6-14 kernel: [139674.327759] [<ffffffffa1375410>] ptl_send_buf+0x1e0/0x540 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.336282] [<ffffffff81042f1c>] ? kvm_clock_read+0x1c/0x20 Oct 11 14:20:08 centos6-14 kernel: [139674.336945] [<ffffffffa1378af5>] ptl_send_rpc+0x665/0xdf0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.337816] [<ffffffffa136e536>] ptlrpc_send_new_req+0x526/0x980 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.351995] [<ffffffffa136e9fd>] ptlrpc_set_add_req+0x6d/0xb0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.352573] [<ffffffffa135affe>] ldlm_server_blocking_ast+0x64e/0x8c0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.353593] [<ffffffffa13ddf49>] tgt_blocking_ast+0x1b9/0x8c0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.354180] [<ffffffffa0ad634f>] ? cfs_trace_unlock_tcd+0x3f/0xa0 [libcfs] Oct 11 14:20:08 centos6-14 kernel: [139674.354650] [<ffffffffa0ae2563>] ? libcfs_debug_vmsg2+0x5d3/0xbd0 [libcfs] Oct 11 14:20:08 centos6-14 kernel: [139674.355232] [<ffffffffa132d094>] ldlm_work_revoke_ast_lock+0xa4/0x1a0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.366338] [<ffffffffa1372007>] ptlrpc_set_wait+0x77/0x9d0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.366889] [<ffffffff8117a334>] ? kmem_cache_alloc_node_trace+0x144/0x210 Oct 11 14:20:08 centos6-14 kernel: [139674.371428] [<ffffffffa136919f>] ? ptlrpc_prep_set+0x5f/0x290 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.371957] [<ffffffff810a00e4>] ? __init_waitqueue_head+0x24/0x40 Oct 11 14:20:08 centos6-14 kernel: [139674.372609] [<ffffffffa1369223>] ? ptlrpc_prep_set+0xe3/0x290 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.373181] [<ffffffffa132cff0>] ? ldlm_work_revoke_ast_lock+0x0/0x1a0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.374623] [<ffffffffa132a0cf>] ldlm_run_ast_work+0xcf/0x440 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.375192] [<ffffffffa1366a46>] ldlm_reclaim_full+0x536/0x8d0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.375751] [<ffffffffa135bb4c>] ldlm_handle_enqueue0+0x14c/0x1580 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.376318] [<ffffffffa13d0d91>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.376950] [<ffffffffa13e2f71>] tgt_enqueue+0x61/0x230 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.377476] [<ffffffffa13e3bbc>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.377925] [<ffffffffa138ecd4>] ptlrpc_main+0xd74/0x1850 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.378344] [<ffffffffa138df60>] ? ptlrpc_main+0x0/0x1850 [ptlrpc] Oct 11 14:20:08 centos6-14 kernel: [139674.378827] [<ffffffff8109f82e>] kthread+0x9e/0xc0 Oct 11 14:20:08 centos6-14 kernel: [139674.379313] [<ffffffff8100c2ca>] child_rip+0xa/0x20 Oct 11 14:20:08 centos6-14 kernel: [139674.379826] [<ffffffff8109f790>] ? kthread+0x0/0xc0 Oct 11 14:20:08 centos6-14 kernel: [139674.380396] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20 this is current master + few patches, two of them lnet: Crashdump failed. |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 12/Oct/15 ] |
|
Hi Amir, This is a lower priority crash that Oleg found, so it can be low priority at this point. Thanks. |