Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.6, Lustre 2.12.7
-
None
-
3
-
9223372036854775807
Description
sanity-quota test_6 started failing on November 13, 2018, Lustre tag 2.11.56.140, with the error
[ 1733.308968] LNet: Service thread pid 18400 was inactive for 40.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
In sanity-quota test_6, we scan the OST1 dmesg log to see if watchdog was triggered. Looking at the logs for https://testing.whamcloud.com/test_sets/9f3095ea-fdc2-11e8-b837-52540065bddc , the dmesg log from OST1 (vm3) contains the NET error and the stack trace
[18752.909319] Lustre: DEBUG MARKER: lctl set_param -n osd*.*OS*.force_sync=1 [18795.136287] LNet: Service thread pid 14192 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [18795.137958] Pid: 14192, comm: ll_ost_io00_002 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Sat Dec 8 05:52:11 UTC 2018 [18795.138944] Call Trace: [18795.139235] [<ffffffffc0f2a880>] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] [18795.140051] [<ffffffffc0f2acd3>] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [18795.140837] [<ffffffffc1115308>] qsd_send_dqacq+0x2e8/0x340 [lquota] [18795.141528] [<ffffffffc1123383>] qsd_acquire+0x8e3/0xcb0 [lquota] [18795.142183] [<ffffffffc11238d4>] qsd_op_begin0+0x184/0x960 [lquota] [18795.142838] [<ffffffffc1124312>] qsd_op_begin+0x262/0x4b0 [lquota] [18795.143571] [<ffffffffc116eac7>] osd_declare_quota+0xd7/0x360 [osd_zfs] [18795.144322] [<ffffffffc1177ff0>] osd_declare_write_commit+0x3d0/0x7f0 [osd_zfs] [18795.145083] [<ffffffffc12958d9>] ofd_commitrw_write+0x939/0x1d40 [ofd] [18795.145833] [<ffffffffc1299de2>] ofd_commitrw+0x4b2/0xa10 [ofd] [18795.146465] [<ffffffffc0f98d6c>] obd_commitrw+0x9c/0x370 [ptlrpc] [18795.147178] [<ffffffffc0f9b9dd>] tgt_brw_write+0x100d/0x1a90 [ptlrpc] [18795.147927] [<ffffffffc0f9f29a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [18795.148649] [<ffffffffc0f4391b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [18795.149488] [<ffffffffc0f4724c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc] [18795.150201] [<ffffffff86abdf21>] kthread+0xd1/0xe0 [18795.150788] [<ffffffff871255f7>] ret_from_fork_nospec_end+0x0/0x39 [18795.151437] [<ffffffffffffffff>] 0xffffffffffffffff [18795.152089] LustreError: dumping log to /tmp/lustre-log.1544552141.14192
There is no other indication of a problem in the console and dmesg logs. We see this issue for both zfs and ldiskfs environments.
Some of these failures are attributed to LU-11644, but the stack traces do not look the same.
Logs for this failure are at
https://testing.whamcloud.com/test_sets/e2bf61ea-e78f-11e8-b67f-52540065bddc
https://testing.whamcloud.com/test_sets/bca63f5a-f60e-11e8-bfe1-52540065bddc
https://testing.whamcloud.com/test_sets/613c72d6-f5d9-11e8-bfe1-52540065bddc
Attachments
Issue Links
- is duplicated by
-
LU-12749 sanity-quota test_6: FAIL: [22292.915881] Lustre: ll_ost_io00_002: service thread pid 4147 was inactive for 40.154 seconds.
- Resolved
- is related to
-
LU-11644 LNet: Service thread inactive for 300 causes client evictions
- Open
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...