[LU-7282] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) ASSERTION( msg->msg_receiving ) failed Created: 11/Oct/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Amir Shehata (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Just had this crash in sanity test 134a:

Oct 11 14:20:08 centos6-14 kernel: [139674.309218] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) ASSERTION( msg->msg_receiving ) failed: 
Oct 11 14:20:08 centos6-14 kernel: [139674.310338] LNetError: 29399:0:(lib-move.c:661:lnet_ni_eager_recv()) LBUG
Oct 11 14:20:08 centos6-14 kernel: [139674.310909] Pid: 29399, comm: mdt00_003
Oct 11 14:20:08 centos6-14 kernel: [139674.311699] 
Oct 11 14:20:08 centos6-14 kernel: [139674.311700] Call Trace:
Oct 11 14:20:08 centos6-14 kernel: [139674.312345]  [<ffffffffa0ad6885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 11 14:20:08 centos6-14 kernel: [139674.312897]  [<ffffffffa0ad6e87>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 11 14:20:08 centos6-14 kernel: [139674.313281]  [<ffffffffa0cebdd0>] lnet_ni_eager_recv+0x1e0/0x220 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.314228]  [<ffffffffa0cee5ad>] lnet_parse_local+0x54d/0xc50 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.314859]  [<ffffffff8117757a>] ? cache_alloc_debugcheck_after+0x14a/0x210
Oct 11 14:20:08 centos6-14 kernel: [139674.315554]  [<ffffffffa0cef37a>] lnet_parse+0x6ca/0xd20 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.316120]  [<ffffffffa0cf014b>] lolnd_send+0x2b/0xa0 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.326047]  [<ffffffffa0ce86eb>] lnet_ni_send+0x4b/0xf0 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.326683]  [<ffffffffa0cecd63>] lnet_send+0x883/0xba0 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.327206]  [<ffffffffa0cedb4c>] LNetPut+0x2fc/0x810 [lnet]
Oct 11 14:20:08 centos6-14 kernel: [139674.327759]  [<ffffffffa1375410>] ptl_send_buf+0x1e0/0x540 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.336282]  [<ffffffff81042f1c>] ? kvm_clock_read+0x1c/0x20
Oct 11 14:20:08 centos6-14 kernel: [139674.336945]  [<ffffffffa1378af5>] ptl_send_rpc+0x665/0xdf0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.337816]  [<ffffffffa136e536>] ptlrpc_send_new_req+0x526/0x980 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.351995]  [<ffffffffa136e9fd>] ptlrpc_set_add_req+0x6d/0xb0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.352573]  [<ffffffffa135affe>] ldlm_server_blocking_ast+0x64e/0x8c0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.353593]  [<ffffffffa13ddf49>] tgt_blocking_ast+0x1b9/0x8c0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.354180]  [<ffffffffa0ad634f>] ? cfs_trace_unlock_tcd+0x3f/0xa0 [libcfs]
Oct 11 14:20:08 centos6-14 kernel: [139674.354650]  [<ffffffffa0ae2563>] ? libcfs_debug_vmsg2+0x5d3/0xbd0 [libcfs]
Oct 11 14:20:08 centos6-14 kernel: [139674.355232]  [<ffffffffa132d094>] ldlm_work_revoke_ast_lock+0xa4/0x1a0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.366338]  [<ffffffffa1372007>] ptlrpc_set_wait+0x77/0x9d0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.366889]  [<ffffffff8117a334>] ? kmem_cache_alloc_node_trace+0x144/0x210
Oct 11 14:20:08 centos6-14 kernel: [139674.371428]  [<ffffffffa136919f>] ? ptlrpc_prep_set+0x5f/0x290 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.371957]  [<ffffffff810a00e4>] ? __init_waitqueue_head+0x24/0x40
Oct 11 14:20:08 centos6-14 kernel: [139674.372609]  [<ffffffffa1369223>] ? ptlrpc_prep_set+0xe3/0x290 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.373181]  [<ffffffffa132cff0>] ? ldlm_work_revoke_ast_lock+0x0/0x1a0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.374623]  [<ffffffffa132a0cf>] ldlm_run_ast_work+0xcf/0x440 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.375192]  [<ffffffffa1366a46>] ldlm_reclaim_full+0x536/0x8d0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.375751]  [<ffffffffa135bb4c>] ldlm_handle_enqueue0+0x14c/0x1580 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.376318]  [<ffffffffa13d0d91>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.376950]  [<ffffffffa13e2f71>] tgt_enqueue+0x61/0x230 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.377476]  [<ffffffffa13e3bbc>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.377925]  [<ffffffffa138ecd4>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.378344]  [<ffffffffa138df60>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
Oct 11 14:20:08 centos6-14 kernel: [139674.378827]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
Oct 11 14:20:08 centos6-14 kernel: [139674.379313]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
Oct 11 14:20:08 centos6-14 kernel: [139674.379826]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
Oct 11 14:20:08 centos6-14 kernel: [139674.380396]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20

this is current master + few patches, two of them lnet: LU-7245 and LU-5733, but I think the crash is unrelated.

Crashdump failed.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 12/Oct/15 ]

Hi Amir,

This is a lower priority crash that Oleg found, so it can be low priority at this point.

Thanks.
Joe

Generated at Sat Feb 10 02:07:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.