[LU-5428] LNet: Service thread pid completed after 0.00s (DDN SR34734) Created: 29/Jul/14 Updated: 21/Mar/17 Resolved: 28/Aug/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oz Rentas | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
New Installation - Lustre 2.4.3 servers, 1.8.9 Clients |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 15111 | ||||
| Description |
|
This problem was reported against a newly installed system at NOAA (Boulder). The system was idle at the time: Jul 17 04:53:57 lfs-mds-0-1 kernel: : LNet: Service thread pid 29363 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Customer stats that he is observing LNet: Service thread pid completed after 0.00s even when the system is idle (they are on testbed (pre-production)). I also saw these same messages on another idle system that was newly installed (Harvard (HMU)). |
| Comments |
| Comment by Peter Jones [ 30/Jul/14 ] |
|
Liang Could you please advise on this one? Thanks Peter |
| Comment by Liang Zhen (Inactive) [ 31/Jul/14 ] |
|
It is strange that we saw "Service thread pid completed after 0.00s", because watchdog should complain only if service thread took too long to finish a request, but here we saw 0.00s. I think it could be a bug in our watchdog code, I will look into it. |
| Comment by Oz Rentas [ 01/Aug/14 ] |
|
Yes, very strange. I agree. >>Btw, I guess the system should be still working fine besides these fault warning? Thanks, |
| Comment by Oz Rentas [ 11/Aug/14 ] |
|
Any ideas on this one? |
| Comment by Liang Zhen (Inactive) [ 12/Aug/14 ] |
|
Hi, sorry for late response. I have worked out a patch: http://review.whamcloud.com/11415 |
| Comment by Liang Zhen (Inactive) [ 19/Aug/14 ] |
|
Patch landed to master |
| Comment by Oz Rentas [ 28/Aug/14 ] |
|
Thanks much. Go ahead and close this. |
| Comment by Peter Jones [ 28/Aug/14 ] |
|
Thanks Oz |
| Comment by Rustem Bikboulatov [ 22/Feb/16 ] |
|
Will this patch work with earlier versions of Lustre? For example, version 2.1.5 ? In version 2.1.5, we are seeing the same symptoms: Feb 21 07:29:59 mmp-2 kernel: Lustre: Service thread pid 5040 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: |
| Comment by Liang Zhen (Inactive) [ 29/Feb/16 ] |
|
yes, I think it should work for 2.1.5. |
| Comment by Gerrit Updater [ 14/Oct/16 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/23162 |
| Comment by Peter Jones [ 21/Mar/17 ] |
|
Patch will be tracked for landing under |