Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
It looks like "LU-9480 lnet: implement Peer Discovery" commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 broke lolnd (suggested by git bisect)
This manifests in e.g. sanity test 101b hanging with this in logs:
[ 215.914245] Lustre: DEBUG MARKER: == sanity test 101b: check stride-io mode read-ahead ================================================= 01:32:15 (1504675935) [ 215.985320] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x5:0x0], use llapi_layout_get_by_path() [ 256.717500] LNet: Service thread pid 4032 was inactive for 40.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 256.720328] Pid: 4032, comm: ll_ost_io00_002 [ 256.721561] Call Trace: [ 256.723391] [<ffffffff81704339>] schedule+0x29/0x70 [ 256.724533] [<ffffffff81700972>] schedule_timeout+0x162/0x2a0 [ 256.725651] [<ffffffff810879f0>] ? process_timeout+0x0/0x10 [ 256.726859] [<ffffffffa0534e3e>] target_bulk_io+0x4ee/0xb20 [ptlrpc] [ 256.729276] [<ffffffff810b7ce0>] ? default_wake_function+0x0/0x20 [ 256.730431] [<ffffffffa05ddf08>] tgt_brw_read+0xf38/0x1870 [ptlrpc] [ 256.731359] [<ffffffffa01ba4a4>] ? libcfs_log_return+0x24/0x30 [libcfs] [ 256.732387] [<ffffffffa0579f90>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [ 256.733578] [<ffffffffa0532800>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 256.734845] [<ffffffffa057a102>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc] [ 256.736719] [<ffffffffa057a291>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [ 256.737931] [<ffffffffa05dad2b>] tgt_request_handle+0x93b/0x1390 [ptlrpc] [ 256.738981] [<ffffffffa05853b1>] ptlrpc_server_handle_request+0x251/0xae0 [ptlrpc] [ 256.740764] [<ffffffffa0589168>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [ 256.741800] [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50 [ 256.742938] [<ffffffffa0588710>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [ 256.743943] [<ffffffff810a2eda>] kthread+0xea/0xf0 [ 256.744963] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [ 256.745913] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [ 256.746933] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [ 256.748798] LustreError: dumping log to /tmp/lustre-log.1504675975.4032 [ 269.494952] LustreError: 2624:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff8800720b3e00
Easy to reproduce, just run this on a single node: ONLY=101 REFORMAT=yes sh sanity.sh