[LU-9957] Some LNet warning message caused by running IOR Created: 08/Sep/17  Updated: 21/Mar/18  Resolved: 21/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Critical
Reporter: sebg-crd-pm (Inactive) Assignee: Peter Jones
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Luster 2.10.0


Rank (Obsolete): 9223372036854775807

 Description   

Hi All,

We have used IOR to stress our Lustre file system that includes 4 X OSS servers and 1X MGS/MDS server. After running a few hours, some LNet warning messages listed below are found in /var/log/messages.

Please kindly give us some suggestions for how to debug our Lustre file system.

Thanks a lot!

===================================================================
Sep 7 21:00:59 oss1 kernel: LNet: Service thread pid 25482 completed after 41.14s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 7 21:22:03 oss1 kernel: LNet: Service thread pid 1088 completed after 67.46s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 7 21:36:50 oss1 kernel: LNet: Service thread pid 21711 completed after 52.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 7 22:20:42 oss1 kernel: LNet: Service thread pid 28459 completed after 22.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 7 22:41:20 oss1 kernel: LNet: Service thread pid 2743 completed after 52.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 7 23:51:26 oss1 kernel: LNet: Service thread pid 406 completed after 50.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep 8 00:30:52 oss1 kernel: LNet: Service thread pid 885 completed after 23.67s. This indicates the system was overloaded (too many service threads, or there were not enough hardware



 Comments   
Comment by Brad Hoagland (Inactive) [ 08/Sep/17 ]

Hello,

Please attach the entire log for us to review (the snippet does not provide enough information).

Regards,

Brad

Comment by Peter Jones [ 21/Mar/18 ]

As the requested log was not supplied I am assuming that this is either low priority or was resolved somehow

Generated at Sat Feb 10 02:30:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.