[LU-12161] Request sent has failed due to network error: [sent 1554382146/real 1554382146] Created: 04/Apr/19 Updated: 09/Apr/19 Resolved: 09/Apr/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Campbell Mcleay (Inactive) | Assignee: | Peter Jones |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hi, We're getting these messages every 10 minutes or so on all of our clients: Apr 4 13:49:06 bravo3 kernel: Lustre: 2835:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1554382146/real 1554382146] req@ffff909af965f800 x1629627154208576/t0(0) o8->bravo-OST0000-osc-ffff909676561800@10.21.22.51@tcp:28/4 lens 520/544 e 0 to 1 dl 1554382201 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 What do these messages mean? Is this something to be concerned about? Thanks for any help. Kind regards, Campbell
|
| Comments |
| Comment by Campbell Mcleay (Inactive) [ 04/Apr/19 ] |
|
I should mention that there is currently, there is no writes/reads on this cluster from the clients. |
| Comment by Joseph Gmitter (Inactive) [ 04/Apr/19 ] |
|
This may be indicative of a network issue with 10.21.22.51. Do you have logs from OST0000 that you could attach for more information? |
| Comment by Campbell Mcleay (Inactive) [ 04/Apr/19 ] |
|
Hi Joseph, Would the message log from the server suffice, or do you need something else? Kind regards, Campbell |
| Comment by Campbell Mcleay (Inactive) [ 04/Apr/19 ] |
|
Most of the failures (1750 out of 1958) are to that one OSS |
| Comment by Patrick Farrell (Inactive) [ 05/Apr/19 ] |
|
Campbell, If the cluster is idle, those are Lustre ping failures. That's a pretty trivial activity for Lustre, so if it's failing (which is what that message means), it's likely some sort of network issue, rather than a Lustre problem. 'dmesg' from the node is probably most helpful for this, but it may not contain the right info. I'd encourage you to look through the logs and possibly do some other network testing. |
| Comment by Campbell Mcleay (Inactive) [ 08/Apr/19 ] |
|
Thanks Patrick, I checked the interface errors and there were none, nor any log entries in /var/log/messages that could explain it. I will do some ping/iperf tests and see what the results are. |
| Comment by Campbell Mcleay (Inactive) [ 09/Apr/19 ] |
|
Hi Patrick, I did a bit of reading and found the lctl ping command, which alerted me to the fact that the OSTs were not mounted on two of the nodes. This is now fixed so you can close this ticket. Apologies for the bother. Regards, Campbell |
| Comment by Peter Jones [ 09/Apr/19 ] |
|
No problem - thanks for letting us know! |