[LU-13821] Lustre: 2835:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: Created: 25/Jul/20  Updated: 07/Oct/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Joe Frith Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL 7.8


Attachments: HTML File debug_client     HTML File debug_server    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Clients experience timeouts when running multiple rsyncs on them.  

 

On the client - 

Jul 25 08:57:30 zabbix01 kernel: Lustre: 2812:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1595681806/real 1595681806] req@ffff8921e2bcda00 x1673155962380928/t0(0) o36->lustre01-MDT0000-mdc-ffff891ff5c7b800@10.42.34.30@tcp:12/10 lens 488/4528 e 0 to 1 dl 1595681850 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Jul 25 08:57:30 zabbix01 kernel: Lustre: lustre01-MDT0000-mdc-ffff891ff5c7b800: Connection to lustre01-MDT0000 (at 10.42.34.30@tcp) was lost; in progress operations using this service will wait for recovery to complete
Jul 25 08:57:30 zabbix01 kernel: Lustre: lustre01-MDT0000-mdc-ffff891ff5c7b800: Connection restored to 10.42.34.30@tcp (at 10.42.34.30@tcp)

 

 

On the server - 

Jul 25 08:57:30 lustremds01 kernel: Lustre: lustre01-MDT0000: Client 8e8cc5cc-b257-0497-a475-10e92f1051df (at 130.199.148.189@tcp) reconnecting
Jul 25 08:57:30 lustremds01 kernel: Lustre: lustre01-MDT0000: Connection restored to 5467a4ae-f9e4-bdcf-f38b-0f32f0db3a8d (at 130.199.148.189@tcp)

 

Also attaching "lctl dk" output for both server/client



 Comments   
Comment by Joe Frith [ 25/Jul/20 ]

Occasionally I also see the below on the server (MDS) side. 

 

Jul 25 09:03:07 lustremds01 kernel: LustreError: 12107:0:(ldlm_lib.c:3279:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff98064af38050 x1673156574796480/t0(0) o37->8e8cc5cc-b257-0497-a475-10e92f1051df@130.199.148.189@tcp:263/0 lens 448/440 e 0 to 0 dl 1595682193 ref 1 fl Interpret:/0/0 rc 0/0

Comment by Joe Frith [ 20/Aug/20 ]

Is there a tunable I can change to avoid this? This is causing high latency and seems to happen often now. 

Comment by Joe Frith [ 12/Sep/20 ]

Any update?

Comment by Joe Frith [ 07/Oct/20 ]

Hello, any update on this?

Generated at Sat Feb 10 03:04:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.