[LU-9235] LNet: Service thread pid 13033 was inactive for 0.00s. Created: 21/Mar/17  Updated: 16/Aug/17  Resolved: 14/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Following call trace appeared on one of OSS servers. Interestingly, it states "inactive for 0.00s" and "pid 13033 completed after 0.00s.".
Oct 11 13:31:28 oss23 kernel: LNet: Service thread pid 13033 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 11 13:31:28 oss23 kernel: Pid: 13033, comm: ll_ost_io01_044
Oct 11 13:31:28 oss23 kernel:
Oct 11 13:31:28 oss23 kernel: Call Trace:
Oct 11 13:31:28 oss23 kernel: [<ffffffffa083dddb>] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
Oct 11 13:31:28 oss23 kernel: [<ffffffffa0844f75>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
Oct 11 13:31:28 oss23 kernel: [<ffffffffa04cd4aa>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
Oct 11 13:31:28 oss23 kernel: [<ffffffffa083d8d9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
Oct 11 13:31:28 oss23 kernel: [<ffffffff81064c00>] ? default_wake_function+0x0/0x20
Oct 11 13:31:28 oss23 kernel: [<ffffffffa08476fd>] ? ptlrpc_main+0xadd/0x1770 [ptlrpc]
Oct 11 13:31:28 oss23 kernel: [<ffffffffa0846c20>] ? ptlrpc_main+0x0/0x1770 [ptlrpc]
Oct 11 13:31:28 oss23 kernel: [<ffffffff8109e78e>] ? kthread+0x9e/0xc0
Oct 11 13:31:28 oss23 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Oct 11 13:31:28 oss23 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Oct 11 13:31:28 oss23 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct 11 13:31:28 oss23 kernel:
Oct 11 13:31:28 oss23 kernel: LustreError: dumping log to /tmp/lustre-log.1476160288.13033
Oct 11 13:31:28 oss23 kernel: LNet: Service thread pid 13033 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).



 Comments   
Comment by Minh Diep [ 24/Mar/17 ]

Patch for this is https://review.whamcloud.com/#/c/23162/

Comment by Gerrit Updater [ 13/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23162/
Subject: LU-9235 libcfs: don't dump stack if just touched
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1376094062c1e46c985f04d82821114d84329699

Comment by Peter Jones [ 14/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 14/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28520
Subject: LU-9235 libcfs: don't dump stack if just touched
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 2242b3bf438e9bf9d593a465ece5e2f1b7833452

Comment by Gerrit Updater [ 16/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28520/
Subject: LU-9235 libcfs: don't dump stack if just touched
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 6fc678dc265a3c321108ab083d3448fab378a135

Generated at Sat Feb 10 02:24:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.