Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

It looks like "~~LU-9480~~ lnet: implement Peer Discovery" commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 broke lolnd (suggested by git bisect)

This manifests in e.g. sanity test 101b hanging with this in logs:

[  215.914245] Lustre: DEBUG MARKER: == sanity test 101b: check stride-io mode read-ahead ================================================= 01:32:15 (1504675935)
[  215.985320] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x5:0x0], use llapi_layout_get_by_path()
[  256.717500] LNet: Service thread pid 4032 was inactive for 40.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[  256.720328] Pid: 4032, comm: ll_ost_io00_002
[  256.721561] 
Call Trace:
[  256.723391]  [<ffffffff81704339>] schedule+0x29/0x70
[  256.724533]  [<ffffffff81700972>] schedule_timeout+0x162/0x2a0
[  256.725651]  [<ffffffff810879f0>] ? process_timeout+0x0/0x10
[  256.726859]  [<ffffffffa0534e3e>] target_bulk_io+0x4ee/0xb20 [ptlrpc]
[  256.729276]  [<ffffffff810b7ce0>] ? default_wake_function+0x0/0x20
[  256.730431]  [<ffffffffa05ddf08>] tgt_brw_read+0xf38/0x1870 [ptlrpc]
[  256.731359]  [<ffffffffa01ba4a4>] ? libcfs_log_return+0x24/0x30 [libcfs]
[  256.732387]  [<ffffffffa0579f90>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
[  256.733578]  [<ffffffffa0532800>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[  256.734845]  [<ffffffffa057a102>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc]
[  256.736719]  [<ffffffffa057a291>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[  256.737931]  [<ffffffffa05dad2b>] tgt_request_handle+0x93b/0x1390 [ptlrpc]
[  256.738981]  [<ffffffffa05853b1>] ptlrpc_server_handle_request+0x251/0xae0 [ptlrpc]
[  256.740764]  [<ffffffffa0589168>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
[  256.741800]  [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50
[  256.742938]  [<ffffffffa0588710>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
[  256.743943]  [<ffffffff810a2eda>] kthread+0xea/0xf0
[  256.744963]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
[  256.745913]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
[  256.746933]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0

[  256.748798] LustreError: dumping log to /tmp/lustre-log.1504675975.4032
[  269.494952] LustreError: 2624:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff8800720b3e00

Easy to reproduce, just run this on a single node: ONLY=101 REFORMAT=yes sh sanity.sh

Attachments

Issue Links

duplicates

LU-9992 Multi-Rail: use lolnd when sending locally

Resolved

is related to

LU-9992 Multi-Rail: use lolnd when sending locally

Resolved

Activity

People

Assignee:: Amir Shehata (Inactive)

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 06/Sep/17 6:30 AM

Updated:: 10/Feb/19 12:12 AM

Resolved:: 22/Aug/18 5:48 PM